I'll have a look. I verified and the code is there to take of this. B.
Sent from my iPhone > On 27 Feb 2017, at 22:34, Dan Davydov <[email protected]> wrote: > > Repro steps: > - Create a DAG with a dummy task > - Let this DAG run for one dagrun > - Add a new subdag operator that contains a dummy operator to this DAG that > has depends_on_past set to true > - click on the white square for the new subdag operator in the DAGs first > dagrun > - Click "Zoom into subdag" (takes you to the graph view for the subdag) > - Click the dummy task in the graph view and click "Mark Success" > - Observe that the list of tasks to mark as success is empty (it should > contain the dummy task) > >> On Mon, Feb 27, 2017 at 1:03 PM, Bolke de Bruin <[email protected]> wrote: >> >> Dan >> >> Can you elaborate on 2, cause I thought I specifically took care of that. >> >> Cheers >> Bolke >> >> Sent from my iPhone >> >>> On 27 Feb 2017, at 20:27, Dan Davydov <[email protected]> >> wrote: >>> >>> I created https://issues.apache.org/jira/browse/AIRFLOW-921 to track the >>> pending issues. >>> >>> There are two more issues we found which I included there: >>> 1. Task instances that have their state manually set to running make the >> UI >>> for their DAG unable to parse >>> 2. Mark success doesn't work for non existent task instances/dagruns >> which >>> breaks the subdag use case (setting tasks as successful via the graph >> view) >>> >>>> On Mon, Feb 27, 2017 at 11:06 AM, Bolke de Bruin <[email protected]> >> wrote: >>>> >>>> Hey Max >>>> >>>> It is massive for sure. Sorry about that ;-). However it is not as >> massive >>>> as you might deduct from a first view. 0) run tasks concurrently across >> dag >>>> runs 1) ordering of the tasks was added to the loop. 2) calculating of >>>> deadlocks, running tasks, tasks to run was corrected, 3) relying on the >>>> executor for status updates was replaced, 4) (tbd) executor failure >> check >>>> to protect against endless Ioops. >>>> >>>> 0+1 seem bigger than they are due to the amount of lines changed. 2 is a >>>> subtle change, that touches a couple of lines to pop/push properly. 3) >> is >>>> bigger, as I didn't like the reliance on the executor. 4) is old code >> that >>>> needs to be added again. >>>> >>>> I probably can leave out 3 which makes 4 mood. The change would be >>>> smaller. Maybe I could even completely remove 3 and just add 4. What are >>>> your thoughts? >>>> >>>> The random failures we were seeing were the "implicit" test of not a >>>> executing in the right order and then deadlocking. But no explicit tests >>>> exist. Help would definitely be appreciated. >>>> >>>> Yes I thought about using the scheduler and/or reusing logic from the >>>> scheduler. I even experimented a little with it but it didn't allow me >> to >>>> pass the tests effectively. >>>> >>>> What I am planning to do is split the function and make it unit testable >>>> if you agree with the current approach. >>>> >>>> Bolke >>>> >>>> Sent from my iPhone >>>> >>>>> On 27 Feb 2017, at 18:35, Maxime Beauchemin < >> [email protected]> >>>> wrote: >>>>> >>>>> This PR is pretty massive and complex! It looks like solid work but >> let's >>>>> be really careful around testing and rolling this out. >>>>> >>>>> This may be out of scope for this PR, but wanted to discuss the idea of >>>>> using the scheduler's logic to perform backfills. It'd be nice to have >>>> that >>>>> logic in one place, though I lost grasp on the details around >> feasibility >>>>> around this approach. I'm sure you looked into this option before >> issuing >>>>> this PR and I'm curious to hear your thoughts on blockers/challenges >>>> around >>>>> this alternate approach. >>>>> >>>>> Also I'm wondering whether we have any sort of mechanisms in our >>>>> integration test to validate that task dependencies are respected and >> run >>>>> in the right order. If not I was thinking we could build some >> abstraction >>>>> to make it easy to write this type of tests in an expressive way. >>>>> >>>>> ``` >>>>> #[some code to run a backfill, or a scheduler session] >>>>> it = IntegrationTestResults(dag_id='exmaple1') >>>>> assert it.ran_before('task1', 'task_2') >>>>> assert ti.overlapped('task1', 'task_3') # confirms 2 tasks ran in >>>> parallel >>>>> assert ti.none_failed() >>>>> assert ti.ran_last('root') >>>>> assert ti.max_concurrency_reached() == POOL_LIMIT >>>>> ``` >>>>> >>>>> Max >>>>> >>>>>> On Mon, Feb 27, 2017 at 5:41 AM, Bolke de Bruin <[email protected]> >>>> wrote: >>>>>> >>>>>> I have worked in the Backfill issue also in collaboration with >> Jeremiah. >>>>>> >>>>>> The refactor to use dag runs in backfills caused a regression >>>>>> in task execution performance as dag runs were executed >>>>>> sequentially. Next to that, the backfills were non deterministic >>>>>> due to the random execution of tasks, causing root tasks >>>>>> being added to the non ready list too soon. >>>>>> >>>>>> This updates the backfill logic as follows: >>>>>> >>>>>> • Parallelize execution of tasks >>>>>> • Use a leave first execution model; Breadth-first algorithm by >>>>>> Jerermiah >>>>>> • Replace state updates from the executor by task based only >>>>>> updates >>>>>> >>>>>> https://github.com/apache/incubator-airflow/pull/2107 >>>>>> >>>>>> Please review and test properly. >>>>>> >>>>>> What has been left out at the moment is the checking the executor >> itself >>>>>> for multiple failures of a task run, where the task itself was never >>>> able >>>>>> to execute. Let me know if this is a real world scenario (maybe when >>>> disk >>>>>> space issue?). I will add it back in. >>>>>> >>>>>> - Bolke >>>>>> >>>>>> >>>>>>> On 25 Feb 2017, at 09:07, Bolke de Bruin <[email protected]> wrote: >>>>>>> >>>>>>> Hi Dan, >>>>>>> >>>>>>> - Backfill indeed runs only one dagrun at the time, see line 1755 of >>>>>> jobs.py. I’ll think about how to fix this over the weekend (I think it >>>> was >>>>>> my change that introduced this). Suggestions always welcome. Depending >>>> the >>>>>> impact it is a blocker or not. We don’t often use backfills and >>>> definitely >>>>>> not at your size, so that is why it didn’t pop up with us. I’m >> assuming >>>>>> blocker for now, btw. >>>>>>> - Speculation on the High DB Load. I’m not sure what your benchmark >> is >>>>>> here (1.7.1 + multi processor dags?), but as you mentioned in the code >>>>>> dependencies are checked a couple of times for one run and even task >>>>>> instance. Dependency checking requires aggregation on the DB, which >> is a >>>>>> performance killer. Annoying but not a blocker. >>>>>>> - Skipped tasks potentially cause a dagrun to be marked >> failure/success >>>>>> prematurely. BranchOperators are widely used if it affects these >>>> operators, >>>>>> then it is a blocker. >>>>>>> >>>>>>> - Bolke >>>>>>> >>>>>>>> On 25 Feb 2017, at 02:04, Dan Davydov <[email protected]. >>>> INVALID> >>>>>> wrote: >>>>>>>> >>>>>>>> Update on old pending issues: >>>>>>>> - Black Squares in UI: Fix merged >>>>>>>> - Double Trigger Issue That Alex G Mentioned: Alex has a PR in >> flight >>>>>>>> >>>>>>>> New Issues: >>>>>>>> - Backfill seems to be having issues (only running one dagrun at a >>>>>> time), >>>>>>>> we are still investigating - might be a blocker >>>>>>>> - High DB Load (~8x more than 1.7) - We are still investigating but >>>> it's >>>>>>>> probably not a blocker for the release >>>>>>>> - Skipped tasks potentially cause a dagrun to be marked as >>>>>> failure/success >>>>>>>> prematurely - not sure whether or not to classify this as a blocker >>>>>> (only >>>>>>>> really an issue for users who use the BranchingPythonOperator, which >>>>>> AirBnB >>>>>>>> does) >>>>>>>> >>>>>>>> On Thu, Feb 23, 2017 at 5:59 PM, siddharth anand <[email protected] >>> >>>>>> wrote: >>>>>>>> >>>>>>>>> IMHO, a DAG run without a start date is non-sensical but is not >>>>>> enforced >>>>>>>>> That said, our UI allows for the manual creation of DAG Runs >> without >>>> a >>>>>>>>> start date as shown in the images below: >>>>>>>>> >>>>>>>>> >>>>>>>>> - https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot% >>>>>>>>> 202017-02-22%2016.00.40.png?dl=0 >>>>>>>>> <https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot% >>>>>>>>> 202017-02-22%2016.00.40.png?dl=0> >>>>>>>>> - https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot% >>>>>>>>> 202017-02-22%2016.02.22.png?dl=0 >>>>>>>>> <https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot% >>>>>>>>> 202017-02-22%2016.02.22.png?dl=0> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Our database may have edge cases that could be associated with >>>> running >>>>>>>>> any >>>>>>>>>> previous version that may or may not have been part of an official >>>>>>>>> release. >>>>>>>>>> >>>>>>>>>> Let's see if anyone else reports the issue. If no one does, one >>>>>> option is >>>>>>>>>> to release 1.8.0 as is with a comment in the release notes, and >>>> have a >>>>>>>>>> future official minor apache release 1.8.1 that would fix these >>>> minor >>>>>>>>>> issues that are not deal breaker. >>>>>>>>>> >>>>>>>>>> @bolke, I'm curious, how long does it take you to go through one >>>>>> release >>>>>>>>>> cycle? Oh, and do you have a documented step by step process for >>>>>>>>> releasing? >>>>>>>>>> I'd like to add the Pypi part to this doc and add committers that >>>> are >>>>>>>>>> interested to have rights on the project on Pypi. >>>>>>>>>> >>>>>>>>>> Max >>>>>>>>>> >>>>>>>>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin < >> [email protected]> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> So it is a database integrity issue? Afaik a start_date should >>>> always >>>>>>>>> be >>>>>>>>>>> set for a DagRun (create_dagrun) does so I didn't check the code >>>>>>>>> though. >>>>>>>>>>> >>>>>>>>>>> Sent from my iPhone >>>>>>>>>>> >>>>>>>>>>>> On 22 Feb 2017, at 22:19, Dan Davydov <[email protected]. >>>>>>>>> INVALID> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Should clarify this occurs when a dagrun does not have a start >>>> date, >>>>>>>>>> not >>>>>>>>>>> a >>>>>>>>>>>> dag (which makes it even less likely to happen). I don't think >>>> this >>>>>>>>> is >>>>>>>>>> a >>>>>>>>>>>> blocker for releasing. >>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov < >>>>>>>>> [email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I rolled this out in our prod and the webservers failed to load >>>> due >>>>>>>>> to >>>>>>>>>>>>> this commit: >>>>>>>>>>>>> >>>>>>>>>>>>> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag >>>>>>>>>>>>> 7c94d81c390881643f94d5e3d7d6fb351a445b72 >>>>>>>>>>>>> >>>>>>>>>>>>> This fixed it: >>>>>>>>>>>>> - </a> <span id="statuses_info" >>>>>>>>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true" >>>>>>>>> title="Start >>>>>>>>>>> Date: >>>>>>>>>>>>> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}"></span> >>>>>>>>>>>>> + </a> <span id="statuses_info" >>>>>>>>>>>>> class="glyphicon glyphicon-info-sign" >> aria-hidden="true"></span> >>>>>>>>>>>>> >>>>>>>>>>>>> This is caused by assuming that all DAGs have start dates set, >>>> so a >>>>>>>>>>> broken >>>>>>>>>>>>> DAG will take down the whole UI. Not sure if we want to make >>>> this a >>>>>>>>>>> blocker >>>>>>>>>>>>> for the release or not, I'm guessing for most deployments this >>>>>> would >>>>>>>>>>> occur >>>>>>>>>>>>> pretty rarely. I'll submit a PR to fix it soon. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini < >>>>>>>>>> [email protected] >>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Ack that the vote has already passed, but belated +1 (binding) >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin < >>>>>> [email protected] >>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> IPMC Voting can be found here: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/ >>>>>>>>>>>>>> 201702.mbox/% >>>>>>>>>>>>>>> [email protected]%3e < >>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/ >>>>>>>>>>>>>> 201702.mbox/% >>>>>>>>>>>>>>> [email protected]%3E> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>> Bolke >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 21 Feb 2017, at 08:20, Bolke de Bruin <[email protected] >>> >>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Apache Airflow (incubating) 1.8.0 (based on RC4) has been >>>>>>>>> accepted. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 9 “+1” votes received: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Maxime Beauchemin (binding) >>>>>>>>>>>>>>>> - Arthur Wiedmer (binding) >>>>>>>>>>>>>>>> - Dan Davydov (binding) >>>>>>>>>>>>>>>> - Jeremiah Lowin (binding) >>>>>>>>>>>>>>>> - Siddharth Anand (binding) >>>>>>>>>>>>>>>> - Alex van Boxel (binding) >>>>>>>>>>>>>>>> - Bolke de Bruin (binding) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Jayesh Senjaliya (non-binding) >>>>>>>>>>>>>>>> - Yi (non-binding) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Vote thread (start): >>>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- >>>>>>>>>>>>>>> airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188- >>>>>>>>>>>>>>> [email protected]%3e <http://mail-archives.apache. >>>>>>>>>>>>>>> org/mod_mbox/incubator-airflow-dev/201702.mbox/%3C7EB7B6D6- >>>>>>>>>>>>>> 092E-48D2-AA0F- >>>>>>>>>>>>>>> [email protected]%3E> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Next steps: >>>>>>>>>>>>>>>> 1) will start the voting process at the IPMC mailinglist. I >> do >>>>>>>>>> expect >>>>>>>>>>>>>>> some changes to be required mostly in documentation maybe a >>>>>>>>> license >>>>>>>>>>> here >>>>>>>>>>>>>>> and there. So, we might end up with changes to stable. As >> long >>>> as >>>>>>>>>>> these >>>>>>>>>>>>>> are >>>>>>>>>>>>>>> not (significant) code changes I will not re-raise the vote. >>>>>>>>>>>>>>>> 2) Only after the positive voting on the IPMC and >>>> finalisation I >>>>>>>>>> will >>>>>>>>>>>>>>> rebrand the RC to Release. >>>>>>>>>>>>>>>> 3) I will upload it to the incubator release page, then the >>>> tar >>>>>>>>>> ball >>>>>>>>>>>>>>> needs to propagate to the mirrors. >>>>>>>>>>>>>>>> 4) Update the website (can someone volunteer please?) >>>>>>>>>>>>>>>> 5) Finally, I will ask Maxime to upload it to pypi. It seems >>>> we >>>>>>>>> can >>>>>>>>>>>>>> keep >>>>>>>>>>>>>>> the apache branding as lib cloud is doing this as well ( >>>>>>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package < >>>>>>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package>). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Jippie! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Bolke >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>>> >>>> >>
