I wasn't able to reproduce the described behavior but replied in the JIRA (let's discuss there) https://issues.apache.org/jira/browse/AIRFLOW-920
On Tue, Feb 28, 2017 at 3:56 AM, Bolke de Bruin <[email protected]> wrote: > Gotcha: > > It works, but then slightly different. > > If you added the subdag, do not zoom in, but click on the subdag in the > main dag. Use mark success there. It will then allow you to mark all tasks > successful that are part of the subdag. > > Do we still consider this a blocker? Imho, no as a workaround seems to > exist. > > - Bolke > > > On 27 Feb 2017, at 23:19, Dan Davydov <[email protected]> > wrote: > > > > rc + your patch (and a couple of our own custom ones) > > > > On Mon, Feb 27, 2017 at 2:11 PM, Bolke de Bruin <[email protected]> > wrote: > > > >> Dan > >> > >> Btw are you running with my patch for this? Or still plain rc? > >> > >> Cheers > >> Bolke > >> > >> Sent from my iPhone > >> > >>> On 27 Feb 2017, at 22:46, Bolke de Bruin <[email protected]> wrote: > >>> > >>> I'll have a look. I verified and the code is there to take of this. > >>> > >>> B. > >>> > >>> Sent from my iPhone > >>> > >>>> On 27 Feb 2017, at 22:34, Dan Davydov <[email protected]. > INVALID> > >> wrote: > >>>> > >>>> Repro steps: > >>>> - Create a DAG with a dummy task > >>>> - Let this DAG run for one dagrun > >>>> - Add a new subdag operator that contains a dummy operator to this DAG > >> that > >>>> has depends_on_past set to true > >>>> - click on the white square for the new subdag operator in the DAGs > >> first > >>>> dagrun > >>>> - Click "Zoom into subdag" (takes you to the graph view for the > subdag) > >>>> - Click the dummy task in the graph view and click "Mark Success" > >>>> - Observe that the list of tasks to mark as success is empty (it > should > >>>> contain the dummy task) > >>>> > >>>>> On Mon, Feb 27, 2017 at 1:03 PM, Bolke de Bruin <[email protected]> > >> wrote: > >>>>> > >>>>> Dan > >>>>> > >>>>> Can you elaborate on 2, cause I thought I specifically took care of > >> that. > >>>>> > >>>>> Cheers > >>>>> Bolke > >>>>> > >>>>> Sent from my iPhone > >>>>> > >>>>>> On 27 Feb 2017, at 20:27, Dan Davydov <[email protected]. > >> INVALID> > >>>>> wrote: > >>>>>> > >>>>>> I created https://issues.apache.org/jira/browse/AIRFLOW-921 to > track > >> the > >>>>>> pending issues. > >>>>>> > >>>>>> There are two more issues we found which I included there: > >>>>>> 1. Task instances that have their state manually set to running make > >> the > >>>>> UI > >>>>>> for their DAG unable to parse > >>>>>> 2. Mark success doesn't work for non existent task instances/dagruns > >>>>> which > >>>>>> breaks the subdag use case (setting tasks as successful via the > graph > >>>>> view) > >>>>>> > >>>>>>> On Mon, Feb 27, 2017 at 11:06 AM, Bolke de Bruin < > [email protected]> > >>>>> wrote: > >>>>>>> > >>>>>>> Hey Max > >>>>>>> > >>>>>>> It is massive for sure. Sorry about that ;-). However it is not as > >>>>> massive > >>>>>>> as you might deduct from a first view. 0) run tasks concurrently > >> across > >>>>> dag > >>>>>>> runs 1) ordering of the tasks was added to the loop. 2) calculating > >> of > >>>>>>> deadlocks, running tasks, tasks to run was corrected, 3) relying on > >> the > >>>>>>> executor for status updates was replaced, 4) (tbd) executor failure > >>>>> check > >>>>>>> to protect against endless Ioops. > >>>>>>> > >>>>>>> 0+1 seem bigger than they are due to the amount of lines changed. 2 > >> is a > >>>>>>> subtle change, that touches a couple of lines to pop/push properly. > >> 3) > >>>>> is > >>>>>>> bigger, as I didn't like the reliance on the executor. 4) is old > code > >>>>> that > >>>>>>> needs to be added again. > >>>>>>> > >>>>>>> I probably can leave out 3 which makes 4 mood. The change would be > >>>>>>> smaller. Maybe I could even completely remove 3 and just add 4. > What > >> are > >>>>>>> your thoughts? > >>>>>>> > >>>>>>> The random failures we were seeing were the "implicit" test of not > a > >>>>>>> executing in the right order and then deadlocking. But no explicit > >> tests > >>>>>>> exist. Help would definitely be appreciated. > >>>>>>> > >>>>>>> Yes I thought about using the scheduler and/or reusing logic from > the > >>>>>>> scheduler. I even experimented a little with it but it didn't allow > >> me > >>>>> to > >>>>>>> pass the tests effectively. > >>>>>>> > >>>>>>> What I am planning to do is split the function and make it unit > >> testable > >>>>>>> if you agree with the current approach. > >>>>>>> > >>>>>>> Bolke > >>>>>>> > >>>>>>> Sent from my iPhone > >>>>>>> > >>>>>>>> On 27 Feb 2017, at 18:35, Maxime Beauchemin < > >>>>> [email protected]> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> This PR is pretty massive and complex! It looks like solid work > but > >>>>> let's > >>>>>>>> be really careful around testing and rolling this out. > >>>>>>>> > >>>>>>>> This may be out of scope for this PR, but wanted to discuss the > >> idea of > >>>>>>>> using the scheduler's logic to perform backfills. It'd be nice to > >> have > >>>>>>> that > >>>>>>>> logic in one place, though I lost grasp on the details around > >>>>> feasibility > >>>>>>>> around this approach. I'm sure you looked into this option before > >>>>> issuing > >>>>>>>> this PR and I'm curious to hear your thoughts on > blockers/challenges > >>>>>>> around > >>>>>>>> this alternate approach. > >>>>>>>> > >>>>>>>> Also I'm wondering whether we have any sort of mechanisms in our > >>>>>>>> integration test to validate that task dependencies are respected > >> and > >>>>> run > >>>>>>>> in the right order. If not I was thinking we could build some > >>>>> abstraction > >>>>>>>> to make it easy to write this type of tests in an expressive way. > >>>>>>>> > >>>>>>>> ``` > >>>>>>>> #[some code to run a backfill, or a scheduler session] > >>>>>>>> it = IntegrationTestResults(dag_id='exmaple1') > >>>>>>>> assert it.ran_before('task1', 'task_2') > >>>>>>>> assert ti.overlapped('task1', 'task_3') # confirms 2 tasks ran in > >>>>>>> parallel > >>>>>>>> assert ti.none_failed() > >>>>>>>> assert ti.ran_last('root') > >>>>>>>> assert ti.max_concurrency_reached() == POOL_LIMIT > >>>>>>>> ``` > >>>>>>>> > >>>>>>>> Max > >>>>>>>> > >>>>>>>>> On Mon, Feb 27, 2017 at 5:41 AM, Bolke de Bruin < > [email protected] > >>> > >>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> I have worked in the Backfill issue also in collaboration with > >>>>> Jeremiah. > >>>>>>>>> > >>>>>>>>> The refactor to use dag runs in backfills caused a regression > >>>>>>>>> in task execution performance as dag runs were executed > >>>>>>>>> sequentially. Next to that, the backfills were non deterministic > >>>>>>>>> due to the random execution of tasks, causing root tasks > >>>>>>>>> being added to the non ready list too soon. > >>>>>>>>> > >>>>>>>>> This updates the backfill logic as follows: > >>>>>>>>> > >>>>>>>>> • Parallelize execution of tasks > >>>>>>>>> • Use a leave first execution model; Breadth-first algorithm > by > >>>>>>>>> Jerermiah > >>>>>>>>> • Replace state updates from the executor by task based only > >>>>>>>>> updates > >>>>>>>>> > >>>>>>>>> https://github.com/apache/incubator-airflow/pull/2107 > >>>>>>>>> > >>>>>>>>> Please review and test properly. > >>>>>>>>> > >>>>>>>>> What has been left out at the moment is the checking the executor > >>>>> itself > >>>>>>>>> for multiple failures of a task run, where the task itself was > >> never > >>>>>>> able > >>>>>>>>> to execute. Let me know if this is a real world scenario (maybe > >> when > >>>>>>> disk > >>>>>>>>> space issue?). I will add it back in. > >>>>>>>>> > >>>>>>>>> - Bolke > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> On 25 Feb 2017, at 09:07, Bolke de Bruin <[email protected]> > >> wrote: > >>>>>>>>>> > >>>>>>>>>> Hi Dan, > >>>>>>>>>> > >>>>>>>>>> - Backfill indeed runs only one dagrun at the time, see line > 1755 > >> of > >>>>>>>>> jobs.py. I’ll think about how to fix this over the weekend (I > >> think it > >>>>>>> was > >>>>>>>>> my change that introduced this). Suggestions always welcome. > >> Depending > >>>>>>> the > >>>>>>>>> impact it is a blocker or not. We don’t often use backfills and > >>>>>>> definitely > >>>>>>>>> not at your size, so that is why it didn’t pop up with us. I’m > >>>>> assuming > >>>>>>>>> blocker for now, btw. > >>>>>>>>>> - Speculation on the High DB Load. I’m not sure what your > >> benchmark > >>>>> is > >>>>>>>>> here (1.7.1 + multi processor dags?), but as you mentioned in the > >> code > >>>>>>>>> dependencies are checked a couple of times for one run and even > >> task > >>>>>>>>> instance. Dependency checking requires aggregation on the DB, > which > >>>>> is a > >>>>>>>>> performance killer. Annoying but not a blocker. > >>>>>>>>>> - Skipped tasks potentially cause a dagrun to be marked > >>>>> failure/success > >>>>>>>>> prematurely. BranchOperators are widely used if it affects these > >>>>>>> operators, > >>>>>>>>> then it is a blocker. > >>>>>>>>>> > >>>>>>>>>> - Bolke > >>>>>>>>>> > >>>>>>>>>>> On 25 Feb 2017, at 02:04, Dan Davydov <[email protected]. > >>>>>>> INVALID> > >>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Update on old pending issues: > >>>>>>>>>>> - Black Squares in UI: Fix merged > >>>>>>>>>>> - Double Trigger Issue That Alex G Mentioned: Alex has a PR in > >>>>> flight > >>>>>>>>>>> > >>>>>>>>>>> New Issues: > >>>>>>>>>>> - Backfill seems to be having issues (only running one dagrun > at > >> a > >>>>>>>>> time), > >>>>>>>>>>> we are still investigating - might be a blocker > >>>>>>>>>>> - High DB Load (~8x more than 1.7) - We are still investigating > >> but > >>>>>>> it's > >>>>>>>>>>> probably not a blocker for the release > >>>>>>>>>>> - Skipped tasks potentially cause a dagrun to be marked as > >>>>>>>>> failure/success > >>>>>>>>>>> prematurely - not sure whether or not to classify this as a > >> blocker > >>>>>>>>> (only > >>>>>>>>>>> really an issue for users who use the BranchingPythonOperator, > >> which > >>>>>>>>> AirBnB > >>>>>>>>>>> does) > >>>>>>>>>>> > >>>>>>>>>>> On Thu, Feb 23, 2017 at 5:59 PM, siddharth anand < > >> [email protected] > >>>>>> > >>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> IMHO, a DAG run without a start date is non-sensical but is > not > >>>>>>>>> enforced > >>>>>>>>>>>> That said, our UI allows for the manual creation of DAG Runs > >>>>> without > >>>>>>> a > >>>>>>>>>>>> start date as shown in the images below: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> - https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot% > >>>>>>>>>>>> 202017-02-22%2016.00.40.png?dl=0 > >>>>>>>>>>>> <https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot% > >>>>>>>>>>>> 202017-02-22%2016.00.40.png?dl=0> > >>>>>>>>>>>> - https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot% > >>>>>>>>>>>> 202017-02-22%2016.02.22.png?dl=0 > >>>>>>>>>>>> <https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot% > >>>>>>>>>>>> 202017-02-22%2016.02.22.png?dl=0> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin < > >>>>>>>>>>>> [email protected]> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Our database may have edge cases that could be associated > with > >>>>>>> running > >>>>>>>>>>>> any > >>>>>>>>>>>>> previous version that may or may not have been part of an > >> official > >>>>>>>>>>>> release. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Let's see if anyone else reports the issue. If no one does, > one > >>>>>>>>> option is > >>>>>>>>>>>>> to release 1.8.0 as is with a comment in the release notes, > and > >>>>>>> have a > >>>>>>>>>>>>> future official minor apache release 1.8.1 that would fix > these > >>>>>>> minor > >>>>>>>>>>>>> issues that are not deal breaker. > >>>>>>>>>>>>> > >>>>>>>>>>>>> @bolke, I'm curious, how long does it take you to go through > >> one > >>>>>>>>> release > >>>>>>>>>>>>> cycle? Oh, and do you have a documented step by step process > >> for > >>>>>>>>>>>> releasing? > >>>>>>>>>>>>> I'd like to add the Pypi part to this doc and add committers > >> that > >>>>>>> are > >>>>>>>>>>>>> interested to have rights on the project on Pypi. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Max > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin < > >>>>> [email protected]> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> So it is a database integrity issue? Afaik a start_date > should > >>>>>>> always > >>>>>>>>>>>> be > >>>>>>>>>>>>>> set for a DagRun (create_dagrun) does so I didn't check the > >> code > >>>>>>>>>>>> though. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Sent from my iPhone > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On 22 Feb 2017, at 22:19, Dan Davydov < > >> [email protected]. > >>>>>>>>>>>> INVALID> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Should clarify this occurs when a dagrun does not have a > >> start > >>>>>>> date, > >>>>>>>>>>>>> not > >>>>>>>>>>>>>> a > >>>>>>>>>>>>>>> dag (which makes it even less likely to happen). I don't > >> think > >>>>>>> this > >>>>>>>>>>>> is > >>>>>>>>>>>>> a > >>>>>>>>>>>>>>> blocker for releasing. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov < > >>>>>>>>>>>> [email protected]> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I rolled this out in our prod and the webservers failed to > >> load > >>>>>>> due > >>>>>>>>>>>> to > >>>>>>>>>>>>>>>> this commit: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger > >> Dag > >>>>>>>>>>>>>>>> 7c94d81c390881643f94d5e3d7d6fb351a445b72 > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> This fixed it: > >>>>>>>>>>>>>>>> - </a> <span id="statuses_info" > >>>>>>>>>>>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true" > >>>>>>>>>>>> title="Start > >>>>>>>>>>>>>> Date: > >>>>>>>>>>>>>>>> {{last_run.start_date.strftime('%Y-%m-%d > %H:%M')}}"></span> > >>>>>>>>>>>>>>>> + </a> <span id="statuses_info" > >>>>>>>>>>>>>>>> class="glyphicon glyphicon-info-sign" > >>>>> aria-hidden="true"></span> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> This is caused by assuming that all DAGs have start dates > >> set, > >>>>>>> so a > >>>>>>>>>>>>>> broken > >>>>>>>>>>>>>>>> DAG will take down the whole UI. Not sure if we want to > make > >>>>>>> this a > >>>>>>>>>>>>>> blocker > >>>>>>>>>>>>>>>> for the release or not, I'm guessing for most deployments > >> this > >>>>>>>>> would > >>>>>>>>>>>>>> occur > >>>>>>>>>>>>>>>> pretty rarely. I'll submit a PR to fix it soon. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini < > >>>>>>>>>>>>> [email protected] > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Ack that the vote has already passed, but belated +1 > >> (binding) > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin < > >>>>>>>>> [email protected] > >>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> IPMC Voting can be found here: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- > >> general/ > >>>>>>>>>>>>>>>>> 201702.mbox/% > >>>>>>>>>>>>>>>>>> [email protected]%3e < > >>>>>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- > >> general/ > >>>>>>>>>>>>>>>>> 201702.mbox/% > >>>>>>>>>>>>>>>>>> [email protected]%3E> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Kind regards, > >>>>>>>>>>>>>>>>>> Bolke > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On 21 Feb 2017, at 08:20, Bolke de Bruin < > >> [email protected] > >>>>>> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Apache Airflow (incubating) 1.8.0 (based on RC4) has > been > >>>>>>>>>>>> accepted. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> 9 “+1” votes received: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> - Maxime Beauchemin (binding) > >>>>>>>>>>>>>>>>>>> - Arthur Wiedmer (binding) > >>>>>>>>>>>>>>>>>>> - Dan Davydov (binding) > >>>>>>>>>>>>>>>>>>> - Jeremiah Lowin (binding) > >>>>>>>>>>>>>>>>>>> - Siddharth Anand (binding) > >>>>>>>>>>>>>>>>>>> - Alex van Boxel (binding) > >>>>>>>>>>>>>>>>>>> - Bolke de Bruin (binding) > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> - Jayesh Senjaliya (non-binding) > >>>>>>>>>>>>>>>>>>> - Yi (non-binding) > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Vote thread (start): > >>>>>>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- > >>>>>>>>>>>>>>>>>> airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188- > >>>>>>>>>>>>>>>>>> [email protected]%3e <http://mail-archives.apache. > >>>>>>>>>>>>>>>>>> org/mod_mbox/incubator-airflow-dev/201702.mbox/% > >> 3C7EB7B6D6- > >>>>>>>>>>>>>>>>> 092E-48D2-AA0F- > >>>>>>>>>>>>>>>>>> [email protected]%3E> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Next steps: > >>>>>>>>>>>>>>>>>>> 1) will start the voting process at the IPMC > >> mailinglist. I > >>>>> do > >>>>>>>>>>>>> expect > >>>>>>>>>>>>>>>>>> some changes to be required mostly in documentation > maybe > >> a > >>>>>>>>>>>> license > >>>>>>>>>>>>>> here > >>>>>>>>>>>>>>>>>> and there. So, we might end up with changes to stable. > As > >>>>> long > >>>>>>> as > >>>>>>>>>>>>>> these > >>>>>>>>>>>>>>>>> are > >>>>>>>>>>>>>>>>>> not (significant) code changes I will not re-raise the > >> vote. > >>>>>>>>>>>>>>>>>>> 2) Only after the positive voting on the IPMC and > >>>>>>> finalisation I > >>>>>>>>>>>>> will > >>>>>>>>>>>>>>>>>> rebrand the RC to Release. > >>>>>>>>>>>>>>>>>>> 3) I will upload it to the incubator release page, then > >> the > >>>>>>> tar > >>>>>>>>>>>>> ball > >>>>>>>>>>>>>>>>>> needs to propagate to the mirrors. > >>>>>>>>>>>>>>>>>>> 4) Update the website (can someone volunteer please?) > >>>>>>>>>>>>>>>>>>> 5) Finally, I will ask Maxime to upload it to pypi. It > >> seems > >>>>>>> we > >>>>>>>>>>>> can > >>>>>>>>>>>>>>>>> keep > >>>>>>>>>>>>>>>>>> the apache branding as lib cloud is doing this as well ( > >>>>>>>>>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package > < > >>>>>>>>>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package > >>> ). > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Jippie! > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Bolke > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>> > >> > >
