Repro steps: - Create a DAG with a dummy task - Let this DAG run for one dagrun - Add a new subdag operator that contains a dummy operator to this DAG that has depends_on_past set to true - click on the white square for the new subdag operator in the DAGs first dagrun - Click "Zoom into subdag" (takes you to the graph view for the subdag) - Click the dummy task in the graph view and click "Mark Success" - Observe that the list of tasks to mark as success is empty (it should contain the dummy task)
On Mon, Feb 27, 2017 at 1:03 PM, Bolke de Bruin <[email protected]> wrote: > Dan > > Can you elaborate on 2, cause I thought I specifically took care of that. > > Cheers > Bolke > > Sent from my iPhone > > > On 27 Feb 2017, at 20:27, Dan Davydov <[email protected]> > wrote: > > > > I created https://issues.apache.org/jira/browse/AIRFLOW-921 to track the > > pending issues. > > > > There are two more issues we found which I included there: > > 1. Task instances that have their state manually set to running make the > UI > > for their DAG unable to parse > > 2. Mark success doesn't work for non existent task instances/dagruns > which > > breaks the subdag use case (setting tasks as successful via the graph > view) > > > >> On Mon, Feb 27, 2017 at 11:06 AM, Bolke de Bruin <[email protected]> > wrote: > >> > >> Hey Max > >> > >> It is massive for sure. Sorry about that ;-). However it is not as > massive > >> as you might deduct from a first view. 0) run tasks concurrently across > dag > >> runs 1) ordering of the tasks was added to the loop. 2) calculating of > >> deadlocks, running tasks, tasks to run was corrected, 3) relying on the > >> executor for status updates was replaced, 4) (tbd) executor failure > check > >> to protect against endless Ioops. > >> > >> 0+1 seem bigger than they are due to the amount of lines changed. 2 is a > >> subtle change, that touches a couple of lines to pop/push properly. 3) > is > >> bigger, as I didn't like the reliance on the executor. 4) is old code > that > >> needs to be added again. > >> > >> I probably can leave out 3 which makes 4 mood. The change would be > >> smaller. Maybe I could even completely remove 3 and just add 4. What are > >> your thoughts? > >> > >> The random failures we were seeing were the "implicit" test of not a > >> executing in the right order and then deadlocking. But no explicit tests > >> exist. Help would definitely be appreciated. > >> > >> Yes I thought about using the scheduler and/or reusing logic from the > >> scheduler. I even experimented a little with it but it didn't allow me > to > >> pass the tests effectively. > >> > >> What I am planning to do is split the function and make it unit testable > >> if you agree with the current approach. > >> > >> Bolke > >> > >> Sent from my iPhone > >> > >>> On 27 Feb 2017, at 18:35, Maxime Beauchemin < > [email protected]> > >> wrote: > >>> > >>> This PR is pretty massive and complex! It looks like solid work but > let's > >>> be really careful around testing and rolling this out. > >>> > >>> This may be out of scope for this PR, but wanted to discuss the idea of > >>> using the scheduler's logic to perform backfills. It'd be nice to have > >> that > >>> logic in one place, though I lost grasp on the details around > feasibility > >>> around this approach. I'm sure you looked into this option before > issuing > >>> this PR and I'm curious to hear your thoughts on blockers/challenges > >> around > >>> this alternate approach. > >>> > >>> Also I'm wondering whether we have any sort of mechanisms in our > >>> integration test to validate that task dependencies are respected and > run > >>> in the right order. If not I was thinking we could build some > abstraction > >>> to make it easy to write this type of tests in an expressive way. > >>> > >>> ``` > >>> #[some code to run a backfill, or a scheduler session] > >>> it = IntegrationTestResults(dag_id='exmaple1') > >>> assert it.ran_before('task1', 'task_2') > >>> assert ti.overlapped('task1', 'task_3') # confirms 2 tasks ran in > >> parallel > >>> assert ti.none_failed() > >>> assert ti.ran_last('root') > >>> assert ti.max_concurrency_reached() == POOL_LIMIT > >>> ``` > >>> > >>> Max > >>> > >>>> On Mon, Feb 27, 2017 at 5:41 AM, Bolke de Bruin <[email protected]> > >> wrote: > >>>> > >>>> I have worked in the Backfill issue also in collaboration with > Jeremiah. > >>>> > >>>> The refactor to use dag runs in backfills caused a regression > >>>> in task execution performance as dag runs were executed > >>>> sequentially. Next to that, the backfills were non deterministic > >>>> due to the random execution of tasks, causing root tasks > >>>> being added to the non ready list too soon. > >>>> > >>>> This updates the backfill logic as follows: > >>>> > >>>> • Parallelize execution of tasks > >>>> • Use a leave first execution model; Breadth-first algorithm by > >>>> Jerermiah > >>>> • Replace state updates from the executor by task based only > >>>> updates > >>>> > >>>> https://github.com/apache/incubator-airflow/pull/2107 > >>>> > >>>> Please review and test properly. > >>>> > >>>> What has been left out at the moment is the checking the executor > itself > >>>> for multiple failures of a task run, where the task itself was never > >> able > >>>> to execute. Let me know if this is a real world scenario (maybe when > >> disk > >>>> space issue?). I will add it back in. > >>>> > >>>> - Bolke > >>>> > >>>> > >>>>> On 25 Feb 2017, at 09:07, Bolke de Bruin <[email protected]> wrote: > >>>>> > >>>>> Hi Dan, > >>>>> > >>>>> - Backfill indeed runs only one dagrun at the time, see line 1755 of > >>>> jobs.py. I’ll think about how to fix this over the weekend (I think it > >> was > >>>> my change that introduced this). Suggestions always welcome. Depending > >> the > >>>> impact it is a blocker or not. We don’t often use backfills and > >> definitely > >>>> not at your size, so that is why it didn’t pop up with us. I’m > assuming > >>>> blocker for now, btw. > >>>>> - Speculation on the High DB Load. I’m not sure what your benchmark > is > >>>> here (1.7.1 + multi processor dags?), but as you mentioned in the code > >>>> dependencies are checked a couple of times for one run and even task > >>>> instance. Dependency checking requires aggregation on the DB, which > is a > >>>> performance killer. Annoying but not a blocker. > >>>>> - Skipped tasks potentially cause a dagrun to be marked > failure/success > >>>> prematurely. BranchOperators are widely used if it affects these > >> operators, > >>>> then it is a blocker. > >>>>> > >>>>> - Bolke > >>>>> > >>>>>> On 25 Feb 2017, at 02:04, Dan Davydov <[email protected]. > >> INVALID> > >>>> wrote: > >>>>>> > >>>>>> Update on old pending issues: > >>>>>> - Black Squares in UI: Fix merged > >>>>>> - Double Trigger Issue That Alex G Mentioned: Alex has a PR in > flight > >>>>>> > >>>>>> New Issues: > >>>>>> - Backfill seems to be having issues (only running one dagrun at a > >>>> time), > >>>>>> we are still investigating - might be a blocker > >>>>>> - High DB Load (~8x more than 1.7) - We are still investigating but > >> it's > >>>>>> probably not a blocker for the release > >>>>>> - Skipped tasks potentially cause a dagrun to be marked as > >>>> failure/success > >>>>>> prematurely - not sure whether or not to classify this as a blocker > >>>> (only > >>>>>> really an issue for users who use the BranchingPythonOperator, which > >>>> AirBnB > >>>>>> does) > >>>>>> > >>>>>> On Thu, Feb 23, 2017 at 5:59 PM, siddharth anand <[email protected] > > > >>>> wrote: > >>>>>> > >>>>>>> IMHO, a DAG run without a start date is non-sensical but is not > >>>> enforced > >>>>>>> That said, our UI allows for the manual creation of DAG Runs > without > >> a > >>>>>>> start date as shown in the images below: > >>>>>>> > >>>>>>> > >>>>>>> - https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot% > >>>>>>> 202017-02-22%2016.00.40.png?dl=0 > >>>>>>> <https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot% > >>>>>>> 202017-02-22%2016.00.40.png?dl=0> > >>>>>>> - https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot% > >>>>>>> 202017-02-22%2016.02.22.png?dl=0 > >>>>>>> <https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot% > >>>>>>> 202017-02-22%2016.02.22.png?dl=0> > >>>>>>> > >>>>>>> > >>>>>>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin < > >>>>>>> [email protected]> wrote: > >>>>>>> > >>>>>>>> Our database may have edge cases that could be associated with > >> running > >>>>>>> any > >>>>>>>> previous version that may or may not have been part of an official > >>>>>>> release. > >>>>>>>> > >>>>>>>> Let's see if anyone else reports the issue. If no one does, one > >>>> option is > >>>>>>>> to release 1.8.0 as is with a comment in the release notes, and > >> have a > >>>>>>>> future official minor apache release 1.8.1 that would fix these > >> minor > >>>>>>>> issues that are not deal breaker. > >>>>>>>> > >>>>>>>> @bolke, I'm curious, how long does it take you to go through one > >>>> release > >>>>>>>> cycle? Oh, and do you have a documented step by step process for > >>>>>>> releasing? > >>>>>>>> I'd like to add the Pypi part to this doc and add committers that > >> are > >>>>>>>> interested to have rights on the project on Pypi. > >>>>>>>> > >>>>>>>> Max > >>>>>>>> > >>>>>>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin < > [email protected]> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>>> So it is a database integrity issue? Afaik a start_date should > >> always > >>>>>>> be > >>>>>>>>> set for a DagRun (create_dagrun) does so I didn't check the code > >>>>>>> though. > >>>>>>>>> > >>>>>>>>> Sent from my iPhone > >>>>>>>>> > >>>>>>>>>> On 22 Feb 2017, at 22:19, Dan Davydov <[email protected]. > >>>>>>> INVALID> > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> Should clarify this occurs when a dagrun does not have a start > >> date, > >>>>>>>> not > >>>>>>>>> a > >>>>>>>>>> dag (which makes it even less likely to happen). I don't think > >> this > >>>>>>> is > >>>>>>>> a > >>>>>>>>>> blocker for releasing. > >>>>>>>>>> > >>>>>>>>>>> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov < > >>>>>>> [email protected]> > >>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> I rolled this out in our prod and the webservers failed to load > >> due > >>>>>>> to > >>>>>>>>>>> this commit: > >>>>>>>>>>> > >>>>>>>>>>> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag > >>>>>>>>>>> 7c94d81c390881643f94d5e3d7d6fb351a445b72 > >>>>>>>>>>> > >>>>>>>>>>> This fixed it: > >>>>>>>>>>> - </a> <span id="statuses_info" > >>>>>>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true" > >>>>>>> title="Start > >>>>>>>>> Date: > >>>>>>>>>>> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}"></span> > >>>>>>>>>>> + </a> <span id="statuses_info" > >>>>>>>>>>> class="glyphicon glyphicon-info-sign" > aria-hidden="true"></span> > >>>>>>>>>>> > >>>>>>>>>>> This is caused by assuming that all DAGs have start dates set, > >> so a > >>>>>>>>> broken > >>>>>>>>>>> DAG will take down the whole UI. Not sure if we want to make > >> this a > >>>>>>>>> blocker > >>>>>>>>>>> for the release or not, I'm guessing for most deployments this > >>>> would > >>>>>>>>> occur > >>>>>>>>>>> pretty rarely. I'll submit a PR to fix it soon. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini < > >>>>>>>> [email protected] > >>>>>>>>>> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Ack that the vote has already passed, but belated +1 (binding) > >>>>>>>>>>>> > >>>>>>>>>>>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin < > >>>> [email protected] > >>>>>>>> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> IPMC Voting can be found here: > >>>>>>>>>>>>> > >>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/ > >>>>>>>>>>>> 201702.mbox/% > >>>>>>>>>>>>> [email protected]%3e < > >>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/ > >>>>>>>>>>>> 201702.mbox/% > >>>>>>>>>>>>> [email protected]%3E> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Kind regards, > >>>>>>>>>>>>> Bolke > >>>>>>>>>>>>> > >>>>>>>>>>>>>> On 21 Feb 2017, at 08:20, Bolke de Bruin <[email protected] > > > >>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Apache Airflow (incubating) 1.8.0 (based on RC4) has been > >>>>>>> accepted. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 9 “+1” votes received: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> - Maxime Beauchemin (binding) > >>>>>>>>>>>>>> - Arthur Wiedmer (binding) > >>>>>>>>>>>>>> - Dan Davydov (binding) > >>>>>>>>>>>>>> - Jeremiah Lowin (binding) > >>>>>>>>>>>>>> - Siddharth Anand (binding) > >>>>>>>>>>>>>> - Alex van Boxel (binding) > >>>>>>>>>>>>>> - Bolke de Bruin (binding) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> - Jayesh Senjaliya (non-binding) > >>>>>>>>>>>>>> - Yi (non-binding) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Vote thread (start): > >>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- > >>>>>>>>>>>>> airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188- > >>>>>>>>>>>>> [email protected]%3e <http://mail-archives.apache. > >>>>>>>>>>>>> org/mod_mbox/incubator-airflow-dev/201702.mbox/%3C7EB7B6D6- > >>>>>>>>>>>> 092E-48D2-AA0F- > >>>>>>>>>>>>> [email protected]%3E> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Next steps: > >>>>>>>>>>>>>> 1) will start the voting process at the IPMC mailinglist. I > do > >>>>>>>> expect > >>>>>>>>>>>>> some changes to be required mostly in documentation maybe a > >>>>>>> license > >>>>>>>>> here > >>>>>>>>>>>>> and there. So, we might end up with changes to stable. As > long > >> as > >>>>>>>>> these > >>>>>>>>>>>> are > >>>>>>>>>>>>> not (significant) code changes I will not re-raise the vote. > >>>>>>>>>>>>>> 2) Only after the positive voting on the IPMC and > >> finalisation I > >>>>>>>> will > >>>>>>>>>>>>> rebrand the RC to Release. > >>>>>>>>>>>>>> 3) I will upload it to the incubator release page, then the > >> tar > >>>>>>>> ball > >>>>>>>>>>>>> needs to propagate to the mirrors. > >>>>>>>>>>>>>> 4) Update the website (can someone volunteer please?) > >>>>>>>>>>>>>> 5) Finally, I will ask Maxime to upload it to pypi. It seems > >> we > >>>>>>> can > >>>>>>>>>>>> keep > >>>>>>>>>>>>> the apache branding as lib cloud is doing this as well ( > >>>>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package < > >>>>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package>). > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Jippie! > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Bolke > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>> > >>>> > >>>> > >> >
