My 2c: I observed both #1 and #2 in Dan's list. I figured y'all had had a discussion about the change in behavior. :) In any case, I made my peace with it, and we've been running happily in production for weeks now, so I personally don't see it as a blocker. Obviously, if it's an issue for you guys at AirBNB, a patch and merge to master is critical, but I still think we should fix this stuff as part of 1.8.1.
One compelling counter argument to this is that there's a bit of whiplash in terms of behavior, where 1.7.1.* behaves one way, then 1.8.0 behaves another, then 1.8.1 goes back to the old way again. I guess I'm just not that worried about it. Anyway.. take it or leave it. :) Cheers, Chris On Thu, Feb 23, 2017 at 12:31 PM, Bolke de Bruin <[email protected]> wrote: > Gotcha. Will be patient. Good luck. > > Bolke > > > On 23 Feb 2017, at 21:12, Dan Davydov <[email protected]> > wrote: > > > > Here is an example for 1, you can see that there are some white tasks > that should have been run. I don't have time to create a skeleton DAG at > the moment unfortunately because of release-related firefighting. Will > hopefully post back here later once firefighting is done. > > > > > > On Thu, Feb 23, 2017 at 12:00 PM, Bolke de Bruin <[email protected] > <mailto:[email protected]>> wrote: > > Hey Dan, Alex, > > > > Indeed #1 seems serious, specifically the the second part - skipping the > root task (root task of the whole DAG?). Do you have a skeleton DAG that > exposes the issue? Is there a root cause analysis? When was the issue > introduced? On the the issue Alex mentioned, we don’t see that and I cannot > really align the description of the issue with the PR yet, ie. I need > clarification. > > > > Obviously, I’m not very happy if we indeed need to retract the release > as we are ~12 hours away from closing of the vote at the IPMC mailinglist > (strangely enough no one has voted yet). However, if it is that serious > that it cannot wait for 1.8.1 then we need to do it. I would define > “serious” as many people are going to be affected by it and they will not > have a workaround available to them (ie. patching code or database), but > the opinion of the community might differ. > > > > Cheers > > Bolke > > > > P.S. I am also interested in #3, as it sounds like a integrity issue > (which verify_integrity should catch) but also maybe too strong a > assumption that such a task should exist (ie. a task was added to a Dag in > a later stage). > > > > > > > On 23 Feb 2017, at 20:15, Dan Davydov <[email protected] <mailto: > [email protected]>.INVALID> wrote: > > > > > > Some more issues found by our users in addition to the one Alex > reported > > > and the UI issue when a dagrun doesn't have a start date: > > > 1. If a task fails it fails the whole dagrun immediately fails, this > is a > > > very large change to how control flow works as the rest of the tasks > in the > > > DAG are not run (even e.g. leaf tasks). The same is true of the skipped > > > status (if a leaf task is skipped then the root task for the DAG will > get > > > skipped and none of the other tasks in the DAG will run). > > > 2. The black squares in the UI for tasks that aren't ready to run yet > are > > > confusing and make it hard for users to see which tasks haven't run yet > > > (lower contrast). We should never initialize tasks in the DB that do > not > > > have a state (or at the least these should be white). > > > 3. The Dagrun has a get_task_instance method that will fail if a dagrun > > > doesn't have a copy of a task instance created which we have seen > happen > > > for some DAGs. This prevents those tasks from getting scheduled. > > > > > > I already patched 3 (and have a PR in flight for open source), and am > > > working on a patch for 1 internally. 1 should be a blocker for > releasing. > > > > > > On Wed, Feb 22, 2017 at 4:38 PM, Alex Guziel <[email protected] > <mailto:[email protected]>.invalid > > >> wrote: > > > > > >> I have some concern that this change > > >> https://github.com/apache/incubator-airflow/pull/1939 < > https://github.com/apache/incubator-airflow/pull/1939> > > >> [AIRFLOW-679] may be having issues because we are seeing lots of > double > > >> triggers > > >> of tasks and tasks being killed as a result. > > >> > > >> > > >> > > >> > > >> > > >> On Wed, Feb 22, 2017 4:35 PM, Dan Davydov > [email protected] > > >> wrote: > > >> Bumping the thread so another user can comment. > > >> > > >> > > >> > > >> > > >> On Wed, Feb 22, 2017 at 3:12 PM, Maxime Beauchemin < > > >> > > >> [email protected] <mailto:[email protected]>> > wrote: > > >> > > >> > > >> > > >> > > >>> What I meant to ask is "how much engineering effort it takes to bake > a > > >> > > >>> single RC?", I guess it depends on how much git-fu is necessary plus > some > > >> > > >>> overhead cost of doing the series of actions/commands/emails/jira. > > >> > > >>> > > >> > > >>> I can volunteer for 1.8.1 (hopefully I can get do it along another > Airbnb > > >> > > >>> engineer/volunteer to tag along) and will try to document/automate > > >> > > >>> everything I can as I go through the process. The goal of 1.8.1 > could be > > >> to > > >> > > >>> basically package 1.8.0 + Dan's bugfix, and for Airbnb to get > familiar > > >> with > > >> > > >>> the process. > > >> > > >>> > > >> > > >>> It'd be great if you can dump your whole process on the wiki, and > we'll > > >> > > >>> improve it on this next pass. > > >> > > >>> > > >> > > >>> Thanks again for the mountain of work that went into packaging this > > >> > > >>> release. > > >> > > >>> > > >> > > >>> Max > > >> > > >>> > > >> > > >>> On Wed, Feb 22, 2017 at 2:44 PM, Bolke de Bruin <[email protected] > <mailto:[email protected]>> > > >> wrote: > > >> > > >>> > > >> > > >>>> I thought you volunteered to baby sit 1.8.1 Chris ;-)? > > >> > > >>>> > > >> > > >>>> Sent from my iPhone > > >> > > >>>> > > >> > > >>>>> On 22 Feb 2017, at 23:31, Chris Riccomini <[email protected] > <mailto:[email protected]>> > > >> > > >>> wrote: > > >> > > >>>>> > > >> > > >>>>> I'm +1 for doing a 1.8.1 fast follow-on > > >> > > >>>>> > > >> > > >>>>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin < > > >> > > >>>>> [email protected] <mailto:[email protected]>> > wrote: > > >> > > >>>>> > > >> > > >>>>>> Our database may have edge cases that could be associated with > > >> running > > >> > > >>>> any > > >> > > >>>>>> previous version that may or may not have been part of an official > > >> > > >>>> release. > > >> > > >>>>>> > > >> > > >>>>>> Let's see if anyone else reports the issue. If no one does, one > > >> option > > >> > > >>>> is > > >> > > >>>>>> to release 1.8.0 as is with a comment in the release notes, and > > >> have a > > >> > > >>>>>> future official minor apache release 1.8.1 that would fix these > > >> minor > > >> > > >>>>>> issues that are not deal breaker. > > >> > > >>>>>> > > >> > > >>>>>> @bolke, I'm curious, how long does it take you to go through one > > >> > > >>> release > > >> > > >>>>>> cycle? Oh, and do you have a documented step by step process for > > >> > > >>>> releasing? > > >> > > >>>>>> I'd like to add the Pypi part to this doc and add committers that > > >> are > > >> > > >>>>>> interested to have rights on the project on Pypi. > > >> > > >>>>>> > > >> > > >>>>>> Max > > >> > > >>>>>> > > >> > > >>>>>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin < > [email protected] <mailto:[email protected]> > > >>> > > >> > > >>>> wrote: > > >> > > >>>>>>> > > >> > > >>>>>>> So it is a database integrity issue? Afaik a start_date should > > >> always > > >> > > >>>> be > > >> > > >>>>>>> set for a DagRun (create_dagrun) does so I didn't check the code > > >> > > >>>> though. > > >> > > >>>>>>> > > >> > > >>>>>>> Sent from my iPhone > > >> > > >>>>>>> > > >> > > >>>>>>>> On 22 Feb 2017, at 22:19, Dan Davydov <[email protected] > <mailto:[email protected]>. > > >> > > >>>> INVALID> > > >> > > >>>>>>> wrote: > > >> > > >>>>>>>> > > >> > > >>>>>>>> Should clarify this occurs when a dagrun does not have a start > > >> date, > > >> > > >>>>>> not > > >> > > >>>>>>> a > > >> > > >>>>>>>> dag (which makes it even less likely to happen). I don't think > > >> this > > >> > > >>> is > > >> > > >>>>>> a > > >> > > >>>>>>>> blocker for releasing. > > >> > > >>>>>>>> > > >> > > >>>>>>>>> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov < > > >> > > >>> [email protected] <mailto:[email protected]> > > >> > > >>>>> > > >> > > >>>>>>> wrote: > > >> > > >>>>>>>>> > > >> > > >>>>>>>>> I rolled this out in our prod and the webservers failed to load > > >> due > > >> > > >>>> to > > >> > > >>>>>>>>> this commit: > > >> > > >>>>>>>>> > > >> > > >>>>>>>>> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag > > >> > > >>>>>>>>> 7c94d81c390881643f94d5e3d7d6fb351a445b72 > > >> > > >>>>>>>>> > > >> > > >>>>>>>>> This fixed it: > > >> > > >>>>>>>>> - </a> <span id="statuses_info" > > >> > > >>>>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true" > > >> > > >>> title="Start > > >> > > >>>>>>> Date: > > >> > > >>>>>>>>> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}"></span> > > >> > > >>>>>>>>> + </a> <span id="statuses_info" > > >> > > >>>>>>>>> class="glyphicon glyphicon-info-sign" > aria-hidden="true"></span> > > >> > > >>>>>>>>> > > >> > > >>>>>>>>> This is caused by assuming that all DAGs have start dates set, > > >> so a > > >> > > >>>>>>> broken > > >> > > >>>>>>>>> DAG will take down the whole UI. Not sure if we want to make > > >> this a > > >> > > >>>>>>> blocker > > >> > > >>>>>>>>> for the release or not, I'm guessing for most deployments this > > >> > > >>> would > > >> > > >>>>>>> occur > > >> > > >>>>>>>>> pretty rarely. I'll submit a PR to fix it soon. > > >> > > >>>>>>>>> > > >> > > >>>>>>>>> > > >> > > >>>>>>>>> > > >> > > >>>>>>>>> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini < > > >> > > >>>>>> [email protected] <mailto:[email protected]> > > >> > > >>>>>>>> > > >> > > >>>>>>>>> wrote: > > >> > > >>>>>>>>> > > >> > > >>>>>>>>>> Ack that the vote has already passed, but belated +1 (binding) > > >> > > >>>>>>>>>> > > >> > > >>>>>>>>>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin < > > >> > > >>> [email protected] <mailto:[email protected]>> > > >> > > >>>>>>>>>> wrote: > > >> > > >>>>>>>>>> > > >> > > >>>>>>>>>>> IPMC Voting can be found here: > > >> > > >>>>>>>>>>> > > >> > > >>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/ > <http://mail-archives.apache.org/mod_mbox/incubator-general/> > > >> > > >>>>>>>>>> 201702.mbox/% > > >> > > >>>>>>>>>>> [email protected] <mailto: > [email protected]>%3e < > > >> > > >>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/ > <http://mail-archives.apache.org/mod_mbox/incubator-general/> > > >> > > >>>>>>>>>> 201702.mbox/% > > >> > > >>>>>>>>>>> [email protected] <mailto: > [email protected]>%3E> > > >> > > >>>>>>>>>>> > > >> > > >>>>>>>>>>> Kind regards, > > >> > > >>>>>>>>>>> Bolke > > >> > > >>>>>>>>>>> > > >> > > >>>>>>>>>>>> On 21 Feb 2017, at 08:20, Bolke de Bruin <[email protected] > <mailto:[email protected]>> > > >> > > >>>>>> wrote: > > >> > > >>>>>>>>>>>> > > >> > > >>>>>>>>>>>> Hello, > > >> > > >>>>>>>>>>>> > > >> > > >>>>>>>>>>>> Apache Airflow (incubating) 1.8.0 (based on RC4) has been > > >> > > >>>> accepted. > > >> > > >>>>>>>>>>>> > > >> > > >>>>>>>>>>>> 9 “+1” votes received: > > >> > > >>>>>>>>>>>> > > >> > > >>>>>>>>>>>> - Maxime Beauchemin (binding) > > >> > > >>>>>>>>>>>> - Arthur Wiedmer (binding) > > >> > > >>>>>>>>>>>> - Dan Davydov (binding) > > >> > > >>>>>>>>>>>> - Jeremiah Lowin (binding) > > >> > > >>>>>>>>>>>> - Siddharth Anand (binding) > > >> > > >>>>>>>>>>>> - Alex van Boxel (binding) > > >> > > >>>>>>>>>>>> - Bolke de Bruin (binding) > > >> > > >>>>>>>>>>>> > > >> > > >>>>>>>>>>>> - Jayesh Senjaliya (non-binding) > > >> > > >>>>>>>>>>>> - Yi (non-binding) > > >> > > >>>>>>>>>>>> > > >> > > >>>>>>>>>>>> Vote thread (start): > > >> > > >>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- < > http://mail-archives.apache.org/mod_mbox/incubator-> > > >> > > >>>>>>>>>>> airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188- > > >> > > >>>>>>>>>>> [email protected] <mailto:[email protected]>%3e < > http://mail-archives.apache <http://mail-archives.apache/>. > > >> > > >>>>>>>>>>> org/mod_mbox/incubator-airflow-dev/201702.mbox/%3C7EB7B6D6- > > >> > > >>>>>>>>>> 092E-48D2-AA0F- > > >> > > >>>>>>>>>>> [email protected] <mailto:[email protected]>%3E> > > >> > > >>>>>>>>>>>> > > >> > > >>>>>>>>>>>> Next steps: > > >> > > >>>>>>>>>>>> 1) will start the voting process at the IPMC mailinglist. I > do > > >> > > >>>>>> expect > > >> > > >>>>>>>>>>> some changes to be required mostly in documentation maybe a > > >> > > >>> license > > >> > > >>>>>>> here > > >> > > >>>>>>>>>>> and there. So, we might end up with changes to stable. As > long > > >> as > > >> > > >>>>>>> these > > >> > > >>>>>>>>>> are > > >> > > >>>>>>>>>>> not (significant) code changes I will not re-raise the vote. > > >> > > >>>>>>>>>>>> 2) Only after the positive voting on the IPMC and > > >> finalisation I > > >> > > >>>>>> will > > >> > > >>>>>>>>>>> rebrand the RC to Release. > > >> > > >>>>>>>>>>>> 3) I will upload it to the incubator release page, then the > > >> tar > > >> > > >>>>>> ball > > >> > > >>>>>>>>>>> needs to propagate to the mirrors. > > >> > > >>>>>>>>>>>> 4) Update the website (can someone volunteer please?) > > >> > > >>>>>>>>>>>> 5) Finally, I will ask Maxime to upload it to pypi. It seems > > >> we > > >> > > >>>> can > > >> > > >>>>>>>>>> keep > > >> > > >>>>>>>>>>> the apache branding as lib cloud is doing this as well ( > > >> > > >>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package < > https://libcloud.apache.org/downloads.html#pypi-package> < > > >> > > >>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package < > https://libcloud.apache.org/downloads.html#pypi-package>>). > > >> > > >>>>>>>>>>>> > > >> > > >>>>>>>>>>>> Jippie! > > >> > > >>>>>>>>>>>> > > >> > > >>>>>>>>>>>> Bolke > > >> > > >>>>>>>>>>> > > >> > > >>>>>>>>>>> > > >> > > >>>>>>>>>> > > >> > > >>>>>>>>> > > >> > > >>>>>>>>> > > >> > > >>>>>>> > > >> > > >>>>>> > > >> > > >>>> > > >> > > >>> > > >> > > > > > >
