rc + your patch (and a couple of our own custom ones) On Mon, Feb 27, 2017 at 2:11 PM, Bolke de Bruin <[email protected]> wrote:
> Dan > > Btw are you running with my patch for this? Or still plain rc? > > Cheers > Bolke > > Sent from my iPhone > > > On 27 Feb 2017, at 22:46, Bolke de Bruin <[email protected]> wrote: > > > > I'll have a look. I verified and the code is there to take of this. > > > > B. > > > > Sent from my iPhone > > > >> On 27 Feb 2017, at 22:34, Dan Davydov <[email protected]> > wrote: > >> > >> Repro steps: > >> - Create a DAG with a dummy task > >> - Let this DAG run for one dagrun > >> - Add a new subdag operator that contains a dummy operator to this DAG > that > >> has depends_on_past set to true > >> - click on the white square for the new subdag operator in the DAGs > first > >> dagrun > >> - Click "Zoom into subdag" (takes you to the graph view for the subdag) > >> - Click the dummy task in the graph view and click "Mark Success" > >> - Observe that the list of tasks to mark as success is empty (it should > >> contain the dummy task) > >> > >>> On Mon, Feb 27, 2017 at 1:03 PM, Bolke de Bruin <[email protected]> > wrote: > >>> > >>> Dan > >>> > >>> Can you elaborate on 2, cause I thought I specifically took care of > that. > >>> > >>> Cheers > >>> Bolke > >>> > >>> Sent from my iPhone > >>> > >>>> On 27 Feb 2017, at 20:27, Dan Davydov <[email protected]. > INVALID> > >>> wrote: > >>>> > >>>> I created https://issues.apache.org/jira/browse/AIRFLOW-921 to track > the > >>>> pending issues. > >>>> > >>>> There are two more issues we found which I included there: > >>>> 1. Task instances that have their state manually set to running make > the > >>> UI > >>>> for their DAG unable to parse > >>>> 2. Mark success doesn't work for non existent task instances/dagruns > >>> which > >>>> breaks the subdag use case (setting tasks as successful via the graph > >>> view) > >>>> > >>>>> On Mon, Feb 27, 2017 at 11:06 AM, Bolke de Bruin <[email protected]> > >>> wrote: > >>>>> > >>>>> Hey Max > >>>>> > >>>>> It is massive for sure. Sorry about that ;-). However it is not as > >>> massive > >>>>> as you might deduct from a first view. 0) run tasks concurrently > across > >>> dag > >>>>> runs 1) ordering of the tasks was added to the loop. 2) calculating > of > >>>>> deadlocks, running tasks, tasks to run was corrected, 3) relying on > the > >>>>> executor for status updates was replaced, 4) (tbd) executor failure > >>> check > >>>>> to protect against endless Ioops. > >>>>> > >>>>> 0+1 seem bigger than they are due to the amount of lines changed. 2 > is a > >>>>> subtle change, that touches a couple of lines to pop/push properly. > 3) > >>> is > >>>>> bigger, as I didn't like the reliance on the executor. 4) is old code > >>> that > >>>>> needs to be added again. > >>>>> > >>>>> I probably can leave out 3 which makes 4 mood. The change would be > >>>>> smaller. Maybe I could even completely remove 3 and just add 4. What > are > >>>>> your thoughts? > >>>>> > >>>>> The random failures we were seeing were the "implicit" test of not a > >>>>> executing in the right order and then deadlocking. But no explicit > tests > >>>>> exist. Help would definitely be appreciated. > >>>>> > >>>>> Yes I thought about using the scheduler and/or reusing logic from the > >>>>> scheduler. I even experimented a little with it but it didn't allow > me > >>> to > >>>>> pass the tests effectively. > >>>>> > >>>>> What I am planning to do is split the function and make it unit > testable > >>>>> if you agree with the current approach. > >>>>> > >>>>> Bolke > >>>>> > >>>>> Sent from my iPhone > >>>>> > >>>>>> On 27 Feb 2017, at 18:35, Maxime Beauchemin < > >>> [email protected]> > >>>>> wrote: > >>>>>> > >>>>>> This PR is pretty massive and complex! It looks like solid work but > >>> let's > >>>>>> be really careful around testing and rolling this out. > >>>>>> > >>>>>> This may be out of scope for this PR, but wanted to discuss the > idea of > >>>>>> using the scheduler's logic to perform backfills. It'd be nice to > have > >>>>> that > >>>>>> logic in one place, though I lost grasp on the details around > >>> feasibility > >>>>>> around this approach. I'm sure you looked into this option before > >>> issuing > >>>>>> this PR and I'm curious to hear your thoughts on blockers/challenges > >>>>> around > >>>>>> this alternate approach. > >>>>>> > >>>>>> Also I'm wondering whether we have any sort of mechanisms in our > >>>>>> integration test to validate that task dependencies are respected > and > >>> run > >>>>>> in the right order. If not I was thinking we could build some > >>> abstraction > >>>>>> to make it easy to write this type of tests in an expressive way. > >>>>>> > >>>>>> ``` > >>>>>> #[some code to run a backfill, or a scheduler session] > >>>>>> it = IntegrationTestResults(dag_id='exmaple1') > >>>>>> assert it.ran_before('task1', 'task_2') > >>>>>> assert ti.overlapped('task1', 'task_3') # confirms 2 tasks ran in > >>>>> parallel > >>>>>> assert ti.none_failed() > >>>>>> assert ti.ran_last('root') > >>>>>> assert ti.max_concurrency_reached() == POOL_LIMIT > >>>>>> ``` > >>>>>> > >>>>>> Max > >>>>>> > >>>>>>> On Mon, Feb 27, 2017 at 5:41 AM, Bolke de Bruin <[email protected] > > > >>>>> wrote: > >>>>>>> > >>>>>>> I have worked in the Backfill issue also in collaboration with > >>> Jeremiah. > >>>>>>> > >>>>>>> The refactor to use dag runs in backfills caused a regression > >>>>>>> in task execution performance as dag runs were executed > >>>>>>> sequentially. Next to that, the backfills were non deterministic > >>>>>>> due to the random execution of tasks, causing root tasks > >>>>>>> being added to the non ready list too soon. > >>>>>>> > >>>>>>> This updates the backfill logic as follows: > >>>>>>> > >>>>>>> • Parallelize execution of tasks > >>>>>>> • Use a leave first execution model; Breadth-first algorithm by > >>>>>>> Jerermiah > >>>>>>> • Replace state updates from the executor by task based only > >>>>>>> updates > >>>>>>> > >>>>>>> https://github.com/apache/incubator-airflow/pull/2107 > >>>>>>> > >>>>>>> Please review and test properly. > >>>>>>> > >>>>>>> What has been left out at the moment is the checking the executor > >>> itself > >>>>>>> for multiple failures of a task run, where the task itself was > never > >>>>> able > >>>>>>> to execute. Let me know if this is a real world scenario (maybe > when > >>>>> disk > >>>>>>> space issue?). I will add it back in. > >>>>>>> > >>>>>>> - Bolke > >>>>>>> > >>>>>>> > >>>>>>>> On 25 Feb 2017, at 09:07, Bolke de Bruin <[email protected]> > wrote: > >>>>>>>> > >>>>>>>> Hi Dan, > >>>>>>>> > >>>>>>>> - Backfill indeed runs only one dagrun at the time, see line 1755 > of > >>>>>>> jobs.py. I’ll think about how to fix this over the weekend (I > think it > >>>>> was > >>>>>>> my change that introduced this). Suggestions always welcome. > Depending > >>>>> the > >>>>>>> impact it is a blocker or not. We don’t often use backfills and > >>>>> definitely > >>>>>>> not at your size, so that is why it didn’t pop up with us. I’m > >>> assuming > >>>>>>> blocker for now, btw. > >>>>>>>> - Speculation on the High DB Load. I’m not sure what your > benchmark > >>> is > >>>>>>> here (1.7.1 + multi processor dags?), but as you mentioned in the > code > >>>>>>> dependencies are checked a couple of times for one run and even > task > >>>>>>> instance. Dependency checking requires aggregation on the DB, which > >>> is a > >>>>>>> performance killer. Annoying but not a blocker. > >>>>>>>> - Skipped tasks potentially cause a dagrun to be marked > >>> failure/success > >>>>>>> prematurely. BranchOperators are widely used if it affects these > >>>>> operators, > >>>>>>> then it is a blocker. > >>>>>>>> > >>>>>>>> - Bolke > >>>>>>>> > >>>>>>>>> On 25 Feb 2017, at 02:04, Dan Davydov <[email protected]. > >>>>> INVALID> > >>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> Update on old pending issues: > >>>>>>>>> - Black Squares in UI: Fix merged > >>>>>>>>> - Double Trigger Issue That Alex G Mentioned: Alex has a PR in > >>> flight > >>>>>>>>> > >>>>>>>>> New Issues: > >>>>>>>>> - Backfill seems to be having issues (only running one dagrun at > a > >>>>>>> time), > >>>>>>>>> we are still investigating - might be a blocker > >>>>>>>>> - High DB Load (~8x more than 1.7) - We are still investigating > but > >>>>> it's > >>>>>>>>> probably not a blocker for the release > >>>>>>>>> - Skipped tasks potentially cause a dagrun to be marked as > >>>>>>> failure/success > >>>>>>>>> prematurely - not sure whether or not to classify this as a > blocker > >>>>>>> (only > >>>>>>>>> really an issue for users who use the BranchingPythonOperator, > which > >>>>>>> AirBnB > >>>>>>>>> does) > >>>>>>>>> > >>>>>>>>> On Thu, Feb 23, 2017 at 5:59 PM, siddharth anand < > [email protected] > >>>> > >>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> IMHO, a DAG run without a start date is non-sensical but is not > >>>>>>> enforced > >>>>>>>>>> That said, our UI allows for the manual creation of DAG Runs > >>> without > >>>>> a > >>>>>>>>>> start date as shown in the images below: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> - https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot% > >>>>>>>>>> 202017-02-22%2016.00.40.png?dl=0 > >>>>>>>>>> <https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot% > >>>>>>>>>> 202017-02-22%2016.00.40.png?dl=0> > >>>>>>>>>> - https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot% > >>>>>>>>>> 202017-02-22%2016.02.22.png?dl=0 > >>>>>>>>>> <https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot% > >>>>>>>>>> 202017-02-22%2016.02.22.png?dl=0> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin < > >>>>>>>>>> [email protected]> wrote: > >>>>>>>>>> > >>>>>>>>>>> Our database may have edge cases that could be associated with > >>>>> running > >>>>>>>>>> any > >>>>>>>>>>> previous version that may or may not have been part of an > official > >>>>>>>>>> release. > >>>>>>>>>>> > >>>>>>>>>>> Let's see if anyone else reports the issue. If no one does, one > >>>>>>> option is > >>>>>>>>>>> to release 1.8.0 as is with a comment in the release notes, and > >>>>> have a > >>>>>>>>>>> future official minor apache release 1.8.1 that would fix these > >>>>> minor > >>>>>>>>>>> issues that are not deal breaker. > >>>>>>>>>>> > >>>>>>>>>>> @bolke, I'm curious, how long does it take you to go through > one > >>>>>>> release > >>>>>>>>>>> cycle? Oh, and do you have a documented step by step process > for > >>>>>>>>>> releasing? > >>>>>>>>>>> I'd like to add the Pypi part to this doc and add committers > that > >>>>> are > >>>>>>>>>>> interested to have rights on the project on Pypi. > >>>>>>>>>>> > >>>>>>>>>>> Max > >>>>>>>>>>> > >>>>>>>>>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin < > >>> [email protected]> > >>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> So it is a database integrity issue? Afaik a start_date should > >>>>> always > >>>>>>>>>> be > >>>>>>>>>>>> set for a DagRun (create_dagrun) does so I didn't check the > code > >>>>>>>>>> though. > >>>>>>>>>>>> > >>>>>>>>>>>> Sent from my iPhone > >>>>>>>>>>>> > >>>>>>>>>>>>> On 22 Feb 2017, at 22:19, Dan Davydov < > [email protected]. > >>>>>>>>>> INVALID> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Should clarify this occurs when a dagrun does not have a > start > >>>>> date, > >>>>>>>>>>> not > >>>>>>>>>>>> a > >>>>>>>>>>>>> dag (which makes it even less likely to happen). I don't > think > >>>>> this > >>>>>>>>>> is > >>>>>>>>>>> a > >>>>>>>>>>>>> blocker for releasing. > >>>>>>>>>>>>> > >>>>>>>>>>>>>> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov < > >>>>>>>>>> [email protected]> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I rolled this out in our prod and the webservers failed to > load > >>>>> due > >>>>>>>>>> to > >>>>>>>>>>>>>> this commit: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger > Dag > >>>>>>>>>>>>>> 7c94d81c390881643f94d5e3d7d6fb351a445b72 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> This fixed it: > >>>>>>>>>>>>>> - </a> <span id="statuses_info" > >>>>>>>>>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true" > >>>>>>>>>> title="Start > >>>>>>>>>>>> Date: > >>>>>>>>>>>>>> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}"></span> > >>>>>>>>>>>>>> + </a> <span id="statuses_info" > >>>>>>>>>>>>>> class="glyphicon glyphicon-info-sign" > >>> aria-hidden="true"></span> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> This is caused by assuming that all DAGs have start dates > set, > >>>>> so a > >>>>>>>>>>>> broken > >>>>>>>>>>>>>> DAG will take down the whole UI. Not sure if we want to make > >>>>> this a > >>>>>>>>>>>> blocker > >>>>>>>>>>>>>> for the release or not, I'm guessing for most deployments > this > >>>>>>> would > >>>>>>>>>>>> occur > >>>>>>>>>>>>>> pretty rarely. I'll submit a PR to fix it soon. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini < > >>>>>>>>>>> [email protected] > >>>>>>>>>>>>> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Ack that the vote has already passed, but belated +1 > (binding) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin < > >>>>>>> [email protected] > >>>>>>>>>>> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> IPMC Voting can be found here: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- > general/ > >>>>>>>>>>>>>>> 201702.mbox/% > >>>>>>>>>>>>>>>> [email protected]%3e < > >>>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- > general/ > >>>>>>>>>>>>>>> 201702.mbox/% > >>>>>>>>>>>>>>>> [email protected]%3E> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Kind regards, > >>>>>>>>>>>>>>>> Bolke > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On 21 Feb 2017, at 08:20, Bolke de Bruin < > [email protected] > >>>> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Apache Airflow (incubating) 1.8.0 (based on RC4) has been > >>>>>>>>>> accepted. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 9 “+1” votes received: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> - Maxime Beauchemin (binding) > >>>>>>>>>>>>>>>>> - Arthur Wiedmer (binding) > >>>>>>>>>>>>>>>>> - Dan Davydov (binding) > >>>>>>>>>>>>>>>>> - Jeremiah Lowin (binding) > >>>>>>>>>>>>>>>>> - Siddharth Anand (binding) > >>>>>>>>>>>>>>>>> - Alex van Boxel (binding) > >>>>>>>>>>>>>>>>> - Bolke de Bruin (binding) > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> - Jayesh Senjaliya (non-binding) > >>>>>>>>>>>>>>>>> - Yi (non-binding) > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Vote thread (start): > >>>>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- > >>>>>>>>>>>>>>>> airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188- > >>>>>>>>>>>>>>>> [email protected]%3e <http://mail-archives.apache. > >>>>>>>>>>>>>>>> org/mod_mbox/incubator-airflow-dev/201702.mbox/% > 3C7EB7B6D6- > >>>>>>>>>>>>>>> 092E-48D2-AA0F- > >>>>>>>>>>>>>>>> [email protected]%3E> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Next steps: > >>>>>>>>>>>>>>>>> 1) will start the voting process at the IPMC > mailinglist. I > >>> do > >>>>>>>>>>> expect > >>>>>>>>>>>>>>>> some changes to be required mostly in documentation maybe > a > >>>>>>>>>> license > >>>>>>>>>>>> here > >>>>>>>>>>>>>>>> and there. So, we might end up with changes to stable. As > >>> long > >>>>> as > >>>>>>>>>>>> these > >>>>>>>>>>>>>>> are > >>>>>>>>>>>>>>>> not (significant) code changes I will not re-raise the > vote. > >>>>>>>>>>>>>>>>> 2) Only after the positive voting on the IPMC and > >>>>> finalisation I > >>>>>>>>>>> will > >>>>>>>>>>>>>>>> rebrand the RC to Release. > >>>>>>>>>>>>>>>>> 3) I will upload it to the incubator release page, then > the > >>>>> tar > >>>>>>>>>>> ball > >>>>>>>>>>>>>>>> needs to propagate to the mirrors. > >>>>>>>>>>>>>>>>> 4) Update the website (can someone volunteer please?) > >>>>>>>>>>>>>>>>> 5) Finally, I will ask Maxime to upload it to pypi. It > seems > >>>>> we > >>>>>>>>>> can > >>>>>>>>>>>>>>> keep > >>>>>>>>>>>>>>>> the apache branding as lib cloud is doing this as well ( > >>>>>>>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package < > >>>>>>>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package > >). > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Jippie! > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Bolke > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>> >
