Gotcha: It works, but then slightly different.
If you added the subdag, do not zoom in, but click on the subdag in the main dag. Use mark success there. It will then allow you to mark all tasks successful that are part of the subdag. Do we still consider this a blocker? Imho, no as a workaround seems to exist. - Bolke > On 27 Feb 2017, at 23:19, Dan Davydov <[email protected]> wrote: > > rc + your patch (and a couple of our own custom ones) > > On Mon, Feb 27, 2017 at 2:11 PM, Bolke de Bruin <[email protected]> wrote: > >> Dan >> >> Btw are you running with my patch for this? Or still plain rc? >> >> Cheers >> Bolke >> >> Sent from my iPhone >> >>> On 27 Feb 2017, at 22:46, Bolke de Bruin <[email protected]> wrote: >>> >>> I'll have a look. I verified and the code is there to take of this. >>> >>> B. >>> >>> Sent from my iPhone >>> >>>> On 27 Feb 2017, at 22:34, Dan Davydov <[email protected]> >> wrote: >>>> >>>> Repro steps: >>>> - Create a DAG with a dummy task >>>> - Let this DAG run for one dagrun >>>> - Add a new subdag operator that contains a dummy operator to this DAG >> that >>>> has depends_on_past set to true >>>> - click on the white square for the new subdag operator in the DAGs >> first >>>> dagrun >>>> - Click "Zoom into subdag" (takes you to the graph view for the subdag) >>>> - Click the dummy task in the graph view and click "Mark Success" >>>> - Observe that the list of tasks to mark as success is empty (it should >>>> contain the dummy task) >>>> >>>>> On Mon, Feb 27, 2017 at 1:03 PM, Bolke de Bruin <[email protected]> >> wrote: >>>>> >>>>> Dan >>>>> >>>>> Can you elaborate on 2, cause I thought I specifically took care of >> that. >>>>> >>>>> Cheers >>>>> Bolke >>>>> >>>>> Sent from my iPhone >>>>> >>>>>> On 27 Feb 2017, at 20:27, Dan Davydov <[email protected]. >> INVALID> >>>>> wrote: >>>>>> >>>>>> I created https://issues.apache.org/jira/browse/AIRFLOW-921 to track >> the >>>>>> pending issues. >>>>>> >>>>>> There are two more issues we found which I included there: >>>>>> 1. Task instances that have their state manually set to running make >> the >>>>> UI >>>>>> for their DAG unable to parse >>>>>> 2. Mark success doesn't work for non existent task instances/dagruns >>>>> which >>>>>> breaks the subdag use case (setting tasks as successful via the graph >>>>> view) >>>>>> >>>>>>> On Mon, Feb 27, 2017 at 11:06 AM, Bolke de Bruin <[email protected]> >>>>> wrote: >>>>>>> >>>>>>> Hey Max >>>>>>> >>>>>>> It is massive for sure. Sorry about that ;-). However it is not as >>>>> massive >>>>>>> as you might deduct from a first view. 0) run tasks concurrently >> across >>>>> dag >>>>>>> runs 1) ordering of the tasks was added to the loop. 2) calculating >> of >>>>>>> deadlocks, running tasks, tasks to run was corrected, 3) relying on >> the >>>>>>> executor for status updates was replaced, 4) (tbd) executor failure >>>>> check >>>>>>> to protect against endless Ioops. >>>>>>> >>>>>>> 0+1 seem bigger than they are due to the amount of lines changed. 2 >> is a >>>>>>> subtle change, that touches a couple of lines to pop/push properly. >> 3) >>>>> is >>>>>>> bigger, as I didn't like the reliance on the executor. 4) is old code >>>>> that >>>>>>> needs to be added again. >>>>>>> >>>>>>> I probably can leave out 3 which makes 4 mood. The change would be >>>>>>> smaller. Maybe I could even completely remove 3 and just add 4. What >> are >>>>>>> your thoughts? >>>>>>> >>>>>>> The random failures we were seeing were the "implicit" test of not a >>>>>>> executing in the right order and then deadlocking. But no explicit >> tests >>>>>>> exist. Help would definitely be appreciated. >>>>>>> >>>>>>> Yes I thought about using the scheduler and/or reusing logic from the >>>>>>> scheduler. I even experimented a little with it but it didn't allow >> me >>>>> to >>>>>>> pass the tests effectively. >>>>>>> >>>>>>> What I am planning to do is split the function and make it unit >> testable >>>>>>> if you agree with the current approach. >>>>>>> >>>>>>> Bolke >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>>> On 27 Feb 2017, at 18:35, Maxime Beauchemin < >>>>> [email protected]> >>>>>>> wrote: >>>>>>>> >>>>>>>> This PR is pretty massive and complex! It looks like solid work but >>>>> let's >>>>>>>> be really careful around testing and rolling this out. >>>>>>>> >>>>>>>> This may be out of scope for this PR, but wanted to discuss the >> idea of >>>>>>>> using the scheduler's logic to perform backfills. It'd be nice to >> have >>>>>>> that >>>>>>>> logic in one place, though I lost grasp on the details around >>>>> feasibility >>>>>>>> around this approach. I'm sure you looked into this option before >>>>> issuing >>>>>>>> this PR and I'm curious to hear your thoughts on blockers/challenges >>>>>>> around >>>>>>>> this alternate approach. >>>>>>>> >>>>>>>> Also I'm wondering whether we have any sort of mechanisms in our >>>>>>>> integration test to validate that task dependencies are respected >> and >>>>> run >>>>>>>> in the right order. If not I was thinking we could build some >>>>> abstraction >>>>>>>> to make it easy to write this type of tests in an expressive way. >>>>>>>> >>>>>>>> ``` >>>>>>>> #[some code to run a backfill, or a scheduler session] >>>>>>>> it = IntegrationTestResults(dag_id='exmaple1') >>>>>>>> assert it.ran_before('task1', 'task_2') >>>>>>>> assert ti.overlapped('task1', 'task_3') # confirms 2 tasks ran in >>>>>>> parallel >>>>>>>> assert ti.none_failed() >>>>>>>> assert ti.ran_last('root') >>>>>>>> assert ti.max_concurrency_reached() == POOL_LIMIT >>>>>>>> ``` >>>>>>>> >>>>>>>> Max >>>>>>>> >>>>>>>>> On Mon, Feb 27, 2017 at 5:41 AM, Bolke de Bruin <[email protected] >>> >>>>>>> wrote: >>>>>>>>> >>>>>>>>> I have worked in the Backfill issue also in collaboration with >>>>> Jeremiah. >>>>>>>>> >>>>>>>>> The refactor to use dag runs in backfills caused a regression >>>>>>>>> in task execution performance as dag runs were executed >>>>>>>>> sequentially. Next to that, the backfills were non deterministic >>>>>>>>> due to the random execution of tasks, causing root tasks >>>>>>>>> being added to the non ready list too soon. >>>>>>>>> >>>>>>>>> This updates the backfill logic as follows: >>>>>>>>> >>>>>>>>> • Parallelize execution of tasks >>>>>>>>> • Use a leave first execution model; Breadth-first algorithm by >>>>>>>>> Jerermiah >>>>>>>>> • Replace state updates from the executor by task based only >>>>>>>>> updates >>>>>>>>> >>>>>>>>> https://github.com/apache/incubator-airflow/pull/2107 >>>>>>>>> >>>>>>>>> Please review and test properly. >>>>>>>>> >>>>>>>>> What has been left out at the moment is the checking the executor >>>>> itself >>>>>>>>> for multiple failures of a task run, where the task itself was >> never >>>>>>> able >>>>>>>>> to execute. Let me know if this is a real world scenario (maybe >> when >>>>>>> disk >>>>>>>>> space issue?). I will add it back in. >>>>>>>>> >>>>>>>>> - Bolke >>>>>>>>> >>>>>>>>> >>>>>>>>>> On 25 Feb 2017, at 09:07, Bolke de Bruin <[email protected]> >> wrote: >>>>>>>>>> >>>>>>>>>> Hi Dan, >>>>>>>>>> >>>>>>>>>> - Backfill indeed runs only one dagrun at the time, see line 1755 >> of >>>>>>>>> jobs.py. I’ll think about how to fix this over the weekend (I >> think it >>>>>>> was >>>>>>>>> my change that introduced this). Suggestions always welcome. >> Depending >>>>>>> the >>>>>>>>> impact it is a blocker or not. We don’t often use backfills and >>>>>>> definitely >>>>>>>>> not at your size, so that is why it didn’t pop up with us. I’m >>>>> assuming >>>>>>>>> blocker for now, btw. >>>>>>>>>> - Speculation on the High DB Load. I’m not sure what your >> benchmark >>>>> is >>>>>>>>> here (1.7.1 + multi processor dags?), but as you mentioned in the >> code >>>>>>>>> dependencies are checked a couple of times for one run and even >> task >>>>>>>>> instance. Dependency checking requires aggregation on the DB, which >>>>> is a >>>>>>>>> performance killer. Annoying but not a blocker. >>>>>>>>>> - Skipped tasks potentially cause a dagrun to be marked >>>>> failure/success >>>>>>>>> prematurely. BranchOperators are widely used if it affects these >>>>>>> operators, >>>>>>>>> then it is a blocker. >>>>>>>>>> >>>>>>>>>> - Bolke >>>>>>>>>> >>>>>>>>>>> On 25 Feb 2017, at 02:04, Dan Davydov <[email protected]. >>>>>>> INVALID> >>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Update on old pending issues: >>>>>>>>>>> - Black Squares in UI: Fix merged >>>>>>>>>>> - Double Trigger Issue That Alex G Mentioned: Alex has a PR in >>>>> flight >>>>>>>>>>> >>>>>>>>>>> New Issues: >>>>>>>>>>> - Backfill seems to be having issues (only running one dagrun at >> a >>>>>>>>> time), >>>>>>>>>>> we are still investigating - might be a blocker >>>>>>>>>>> - High DB Load (~8x more than 1.7) - We are still investigating >> but >>>>>>> it's >>>>>>>>>>> probably not a blocker for the release >>>>>>>>>>> - Skipped tasks potentially cause a dagrun to be marked as >>>>>>>>> failure/success >>>>>>>>>>> prematurely - not sure whether or not to classify this as a >> blocker >>>>>>>>> (only >>>>>>>>>>> really an issue for users who use the BranchingPythonOperator, >> which >>>>>>>>> AirBnB >>>>>>>>>>> does) >>>>>>>>>>> >>>>>>>>>>> On Thu, Feb 23, 2017 at 5:59 PM, siddharth anand < >> [email protected] >>>>>> >>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> IMHO, a DAG run without a start date is non-sensical but is not >>>>>>>>> enforced >>>>>>>>>>>> That said, our UI allows for the manual creation of DAG Runs >>>>> without >>>>>>> a >>>>>>>>>>>> start date as shown in the images below: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> - https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot% >>>>>>>>>>>> 202017-02-22%2016.00.40.png?dl=0 >>>>>>>>>>>> <https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot% >>>>>>>>>>>> 202017-02-22%2016.00.40.png?dl=0> >>>>>>>>>>>> - https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot% >>>>>>>>>>>> 202017-02-22%2016.02.22.png?dl=0 >>>>>>>>>>>> <https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot% >>>>>>>>>>>> 202017-02-22%2016.02.22.png?dl=0> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Our database may have edge cases that could be associated with >>>>>>> running >>>>>>>>>>>> any >>>>>>>>>>>>> previous version that may or may not have been part of an >> official >>>>>>>>>>>> release. >>>>>>>>>>>>> >>>>>>>>>>>>> Let's see if anyone else reports the issue. If no one does, one >>>>>>>>> option is >>>>>>>>>>>>> to release 1.8.0 as is with a comment in the release notes, and >>>>>>> have a >>>>>>>>>>>>> future official minor apache release 1.8.1 that would fix these >>>>>>> minor >>>>>>>>>>>>> issues that are not deal breaker. >>>>>>>>>>>>> >>>>>>>>>>>>> @bolke, I'm curious, how long does it take you to go through >> one >>>>>>>>> release >>>>>>>>>>>>> cycle? Oh, and do you have a documented step by step process >> for >>>>>>>>>>>> releasing? >>>>>>>>>>>>> I'd like to add the Pypi part to this doc and add committers >> that >>>>>>> are >>>>>>>>>>>>> interested to have rights on the project on Pypi. >>>>>>>>>>>>> >>>>>>>>>>>>> Max >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin < >>>>> [email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> So it is a database integrity issue? Afaik a start_date should >>>>>>> always >>>>>>>>>>>> be >>>>>>>>>>>>>> set for a DagRun (create_dagrun) does so I didn't check the >> code >>>>>>>>>>>> though. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sent from my iPhone >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 22 Feb 2017, at 22:19, Dan Davydov < >> [email protected]. >>>>>>>>>>>> INVALID> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Should clarify this occurs when a dagrun does not have a >> start >>>>>>> date, >>>>>>>>>>>>> not >>>>>>>>>>>>>> a >>>>>>>>>>>>>>> dag (which makes it even less likely to happen). I don't >> think >>>>>>> this >>>>>>>>>>>> is >>>>>>>>>>>>> a >>>>>>>>>>>>>>> blocker for releasing. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov < >>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I rolled this out in our prod and the webservers failed to >> load >>>>>>> due >>>>>>>>>>>> to >>>>>>>>>>>>>>>> this commit: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger >> Dag >>>>>>>>>>>>>>>> 7c94d81c390881643f94d5e3d7d6fb351a445b72 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This fixed it: >>>>>>>>>>>>>>>> - </a> <span id="statuses_info" >>>>>>>>>>>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true" >>>>>>>>>>>> title="Start >>>>>>>>>>>>>> Date: >>>>>>>>>>>>>>>> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}"></span> >>>>>>>>>>>>>>>> + </a> <span id="statuses_info" >>>>>>>>>>>>>>>> class="glyphicon glyphicon-info-sign" >>>>> aria-hidden="true"></span> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This is caused by assuming that all DAGs have start dates >> set, >>>>>>> so a >>>>>>>>>>>>>> broken >>>>>>>>>>>>>>>> DAG will take down the whole UI. Not sure if we want to make >>>>>>> this a >>>>>>>>>>>>>> blocker >>>>>>>>>>>>>>>> for the release or not, I'm guessing for most deployments >> this >>>>>>>>> would >>>>>>>>>>>>>> occur >>>>>>>>>>>>>>>> pretty rarely. I'll submit a PR to fix it soon. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini < >>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Ack that the vote has already passed, but belated +1 >> (binding) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin < >>>>>>>>> [email protected] >>>>>>>>>>>>> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> IPMC Voting can be found here: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- >> general/ >>>>>>>>>>>>>>>>> 201702.mbox/% >>>>>>>>>>>>>>>>>> [email protected]%3e < >>>>>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- >> general/ >>>>>>>>>>>>>>>>> 201702.mbox/% >>>>>>>>>>>>>>>>>> [email protected]%3E> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>> Bolke >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 21 Feb 2017, at 08:20, Bolke de Bruin < >> [email protected] >>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Apache Airflow (incubating) 1.8.0 (based on RC4) has been >>>>>>>>>>>> accepted. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 9 “+1” votes received: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - Maxime Beauchemin (binding) >>>>>>>>>>>>>>>>>>> - Arthur Wiedmer (binding) >>>>>>>>>>>>>>>>>>> - Dan Davydov (binding) >>>>>>>>>>>>>>>>>>> - Jeremiah Lowin (binding) >>>>>>>>>>>>>>>>>>> - Siddharth Anand (binding) >>>>>>>>>>>>>>>>>>> - Alex van Boxel (binding) >>>>>>>>>>>>>>>>>>> - Bolke de Bruin (binding) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - Jayesh Senjaliya (non-binding) >>>>>>>>>>>>>>>>>>> - Yi (non-binding) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Vote thread (start): >>>>>>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- >>>>>>>>>>>>>>>>>> airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188- >>>>>>>>>>>>>>>>>> [email protected]%3e <http://mail-archives.apache. >>>>>>>>>>>>>>>>>> org/mod_mbox/incubator-airflow-dev/201702.mbox/% >> 3C7EB7B6D6- >>>>>>>>>>>>>>>>> 092E-48D2-AA0F- >>>>>>>>>>>>>>>>>> [email protected]%3E> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Next steps: >>>>>>>>>>>>>>>>>>> 1) will start the voting process at the IPMC >> mailinglist. I >>>>> do >>>>>>>>>>>>> expect >>>>>>>>>>>>>>>>>> some changes to be required mostly in documentation maybe >> a >>>>>>>>>>>> license >>>>>>>>>>>>>> here >>>>>>>>>>>>>>>>>> and there. So, we might end up with changes to stable. As >>>>> long >>>>>>> as >>>>>>>>>>>>>> these >>>>>>>>>>>>>>>>> are >>>>>>>>>>>>>>>>>> not (significant) code changes I will not re-raise the >> vote. >>>>>>>>>>>>>>>>>>> 2) Only after the positive voting on the IPMC and >>>>>>> finalisation I >>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>> rebrand the RC to Release. >>>>>>>>>>>>>>>>>>> 3) I will upload it to the incubator release page, then >> the >>>>>>> tar >>>>>>>>>>>>> ball >>>>>>>>>>>>>>>>>> needs to propagate to the mirrors. >>>>>>>>>>>>>>>>>>> 4) Update the website (can someone volunteer please?) >>>>>>>>>>>>>>>>>>> 5) Finally, I will ask Maxime to upload it to pypi. It >> seems >>>>>>> we >>>>>>>>>>>> can >>>>>>>>>>>>>>>>> keep >>>>>>>>>>>>>>>>>> the apache branding as lib cloud is doing this as well ( >>>>>>>>>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package < >>>>>>>>>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package >>> ). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Jippie! >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Bolke >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>
