Here is the DAG: http://imgur.com/a/zXXsS
On Thu, Feb 23, 2017 at 12:18 PM, Arthur Wiedmer <[email protected]> wrote: > Dan, > > Inline images get stripped by the mailing server. You will have to upload > to imgur or something. > > Best > Arthur > > On Feb 23, 2017 12:13 PM, "Dan Davydov" <[email protected]> > wrote: > > > Here is an example for 1, you can see that there are some white tasks > that > > should have been run. I don't have time to create a skeleton DAG at the > > moment unfortunately because of release-related firefighting. Will > > hopefully post back here later once firefighting is done. > > [image: Inline image 1] > > > > On Thu, Feb 23, 2017 at 12:00 PM, Bolke de Bruin <[email protected]> > > wrote: > > > >> Hey Dan, Alex, > >> > >> Indeed #1 seems serious, specifically the the second part - skipping the > >> root task (root task of the whole DAG?). Do you have a skeleton DAG that > >> exposes the issue? Is there a root cause analysis? When was the issue > >> introduced? On the the issue Alex mentioned, we don’t see that and I > cannot > >> really align the description of the issue with the PR yet, ie. I need > >> clarification. > >> > >> Obviously, I’m not very happy if we indeed need to retract the release > as > >> we are ~12 hours away from closing of the vote at the IPMC mailinglist > >> (strangely enough no one has voted yet). However, if it is that serious > >> that it cannot wait for 1.8.1 then we need to do it. I would define > >> “serious” as many people are going to be affected by it and they will > not > >> have a workaround available to them (ie. patching code or database), but > >> the opinion of the community might differ. > >> > >> Cheers > >> Bolke > >> > >> P.S. I am also interested in #3, as it sounds like a integrity issue > >> (which verify_integrity should catch) but also maybe too strong a > >> assumption that such a task should exist (ie. a task was added to a Dag > in > >> a later stage). > >> > >> > >> > On 23 Feb 2017, at 20:15, Dan Davydov <[email protected]. > INVALID> > >> wrote: > >> > > >> > Some more issues found by our users in addition to the one Alex > reported > >> > and the UI issue when a dagrun doesn't have a start date: > >> > 1. If a task fails it fails the whole dagrun immediately fails, this > is > >> a > >> > very large change to how control flow works as the rest of the tasks > in > >> the > >> > DAG are not run (even e.g. leaf tasks). The same is true of the > skipped > >> > status (if a leaf task is skipped then the root task for the DAG will > >> get > >> > skipped and none of the other tasks in the DAG will run). > >> > 2. The black squares in the UI for tasks that aren't ready to run yet > >> are > >> > confusing and make it hard for users to see which tasks haven't run > yet > >> > (lower contrast). We should never initialize tasks in the DB that do > not > >> > have a state (or at the least these should be white). > >> > 3. The Dagrun has a get_task_instance method that will fail if a > dagrun > >> > doesn't have a copy of a task instance created which we have seen > happen > >> > for some DAGs. This prevents those tasks from getting scheduled. > >> > > >> > I already patched 3 (and have a PR in flight for open source), and am > >> > working on a patch for 1 internally. 1 should be a blocker for > >> releasing. > >> > > >> > On Wed, Feb 22, 2017 at 4:38 PM, Alex Guziel <[email protected] > >> .invalid > >> >> wrote: > >> > > >> >> I have some concern that this change > >> >> https://github.com/apache/incubator-airflow/pull/1939 > >> >> [AIRFLOW-679] may be having issues because we are seeing lots of > double > >> >> triggers > >> >> of tasks and tasks being killed as a result. > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> On Wed, Feb 22, 2017 4:35 PM, Dan Davydov > >> [email protected] > >> >> wrote: > >> >> Bumping the thread so another user can comment. > >> >> > >> >> > >> >> > >> >> > >> >> On Wed, Feb 22, 2017 at 3:12 PM, Maxime Beauchemin < > >> >> > >> >> [email protected]> wrote: > >> >> > >> >> > >> >> > >> >> > >> >>> What I meant to ask is "how much engineering effort it takes to > bake a > >> >> > >> >>> single RC?", I guess it depends on how much git-fu is necessary plus > >> some > >> >> > >> >>> overhead cost of doing the series of actions/commands/emails/jira. > >> >> > >> >>> > >> >> > >> >>> I can volunteer for 1.8.1 (hopefully I can get do it along another > >> Airbnb > >> >> > >> >>> engineer/volunteer to tag along) and will try to document/automate > >> >> > >> >>> everything I can as I go through the process. The goal of 1.8.1 > could > >> be > >> >> to > >> >> > >> >>> basically package 1.8.0 + Dan's bugfix, and for Airbnb to get > familiar > >> >> with > >> >> > >> >>> the process. > >> >> > >> >>> > >> >> > >> >>> It'd be great if you can dump your whole process on the wiki, and > >> we'll > >> >> > >> >>> improve it on this next pass. > >> >> > >> >>> > >> >> > >> >>> Thanks again for the mountain of work that went into packaging this > >> >> > >> >>> release. > >> >> > >> >>> > >> >> > >> >>> Max > >> >> > >> >>> > >> >> > >> >>> On Wed, Feb 22, 2017 at 2:44 PM, Bolke de Bruin <[email protected]> > >> >> wrote: > >> >> > >> >>> > >> >> > >> >>>> I thought you volunteered to baby sit 1.8.1 Chris ;-)? > >> >> > >> >>>> > >> >> > >> >>>> Sent from my iPhone > >> >> > >> >>>> > >> >> > >> >>>>> On 22 Feb 2017, at 23:31, Chris Riccomini <[email protected]> > >> >> > >> >>> wrote: > >> >> > >> >>>>> > >> >> > >> >>>>> I'm +1 for doing a 1.8.1 fast follow-on > >> >> > >> >>>>> > >> >> > >> >>>>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin < > >> >> > >> >>>>> [email protected]> wrote: > >> >> > >> >>>>> > >> >> > >> >>>>>> Our database may have edge cases that could be associated with > >> >> running > >> >> > >> >>>> any > >> >> > >> >>>>>> previous version that may or may not have been part of an > official > >> >> > >> >>>> release. > >> >> > >> >>>>>> > >> >> > >> >>>>>> Let's see if anyone else reports the issue. If no one does, one > >> >> option > >> >> > >> >>>> is > >> >> > >> >>>>>> to release 1.8.0 as is with a comment in the release notes, and > >> >> have a > >> >> > >> >>>>>> future official minor apache release 1.8.1 that would fix these > >> >> minor > >> >> > >> >>>>>> issues that are not deal breaker. > >> >> > >> >>>>>> > >> >> > >> >>>>>> @bolke, I'm curious, how long does it take you to go through one > >> >> > >> >>> release > >> >> > >> >>>>>> cycle? Oh, and do you have a documented step by step process for > >> >> > >> >>>> releasing? > >> >> > >> >>>>>> I'd like to add the Pypi part to this doc and add committers that > >> >> are > >> >> > >> >>>>>> interested to have rights on the project on Pypi. > >> >> > >> >>>>>> > >> >> > >> >>>>>> Max > >> >> > >> >>>>>> > >> >> > >> >>>>>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin < > >> [email protected] > >> >>> > >> >> > >> >>>> wrote: > >> >> > >> >>>>>>> > >> >> > >> >>>>>>> So it is a database integrity issue? Afaik a start_date should > >> >> always > >> >> > >> >>>> be > >> >> > >> >>>>>>> set for a DagRun (create_dagrun) does so I didn't check the code > >> >> > >> >>>> though. > >> >> > >> >>>>>>> > >> >> > >> >>>>>>> Sent from my iPhone > >> >> > >> >>>>>>> > >> >> > >> >>>>>>>> On 22 Feb 2017, at 22:19, Dan Davydov <[email protected]. > >> >> > >> >>>> INVALID> > >> >> > >> >>>>>>> wrote: > >> >> > >> >>>>>>>> > >> >> > >> >>>>>>>> Should clarify this occurs when a dagrun does not have a start > >> >> date, > >> >> > >> >>>>>> not > >> >> > >> >>>>>>> a > >> >> > >> >>>>>>>> dag (which makes it even less likely to happen). I don't think > >> >> this > >> >> > >> >>> is > >> >> > >> >>>>>> a > >> >> > >> >>>>>>>> blocker for releasing. > >> >> > >> >>>>>>>> > >> >> > >> >>>>>>>>> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov < > >> >> > >> >>> [email protected] > >> >> > >> >>>>> > >> >> > >> >>>>>>> wrote: > >> >> > >> >>>>>>>>> > >> >> > >> >>>>>>>>> I rolled this out in our prod and the webservers failed to > load > >> >> due > >> >> > >> >>>> to > >> >> > >> >>>>>>>>> this commit: > >> >> > >> >>>>>>>>> > >> >> > >> >>>>>>>>> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag > >> >> > >> >>>>>>>>> 7c94d81c390881643f94d5e3d7d6fb351a445b72 > >> >> > >> >>>>>>>>> > >> >> > >> >>>>>>>>> This fixed it: > >> >> > >> >>>>>>>>> - </a> <span id="statuses_info" > >> >> > >> >>>>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true" > >> >> > >> >>> title="Start > >> >> > >> >>>>>>> Date: > >> >> > >> >>>>>>>>> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}"></span> > >> >> > >> >>>>>>>>> + </a> <span id="statuses_info" > >> >> > >> >>>>>>>>> class="glyphicon glyphicon-info-sign" > aria-hidden="true"></span> > >> >> > >> >>>>>>>>> > >> >> > >> >>>>>>>>> This is caused by assuming that all DAGs have start dates set, > >> >> so a > >> >> > >> >>>>>>> broken > >> >> > >> >>>>>>>>> DAG will take down the whole UI. Not sure if we want to make > >> >> this a > >> >> > >> >>>>>>> blocker > >> >> > >> >>>>>>>>> for the release or not, I'm guessing for most deployments this > >> >> > >> >>> would > >> >> > >> >>>>>>> occur > >> >> > >> >>>>>>>>> pretty rarely. I'll submit a PR to fix it soon. > >> >> > >> >>>>>>>>> > >> >> > >> >>>>>>>>> > >> >> > >> >>>>>>>>> > >> >> > >> >>>>>>>>> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini < > >> >> > >> >>>>>> [email protected] > >> >> > >> >>>>>>>> > >> >> > >> >>>>>>>>> wrote: > >> >> > >> >>>>>>>>> > >> >> > >> >>>>>>>>>> Ack that the vote has already passed, but belated +1 > (binding) > >> >> > >> >>>>>>>>>> > >> >> > >> >>>>>>>>>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin < > >> >> > >> >>> [email protected]> > >> >> > >> >>>>>>>>>> wrote: > >> >> > >> >>>>>>>>>> > >> >> > >> >>>>>>>>>>> IPMC Voting can be found here: > >> >> > >> >>>>>>>>>>> > >> >> > >> >>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/ > >> >> > >> >>>>>>>>>> 201702.mbox/% > >> >> > >> >>>>>>>>>>> [email protected]%3e < > >> >> > >> >>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/ > >> >> > >> >>>>>>>>>> 201702.mbox/% > >> >> > >> >>>>>>>>>>> [email protected]%3E> > >> >> > >> >>>>>>>>>>> > >> >> > >> >>>>>>>>>>> Kind regards, > >> >> > >> >>>>>>>>>>> Bolke > >> >> > >> >>>>>>>>>>> > >> >> > >> >>>>>>>>>>>> On 21 Feb 2017, at 08:20, Bolke de Bruin < > [email protected]> > >> >> > >> >>>>>> wrote: > >> >> > >> >>>>>>>>>>>> > >> >> > >> >>>>>>>>>>>> Hello, > >> >> > >> >>>>>>>>>>>> > >> >> > >> >>>>>>>>>>>> Apache Airflow (incubating) 1.8.0 (based on RC4) has been > >> >> > >> >>>> accepted. > >> >> > >> >>>>>>>>>>>> > >> >> > >> >>>>>>>>>>>> 9 “+1” votes received: > >> >> > >> >>>>>>>>>>>> > >> >> > >> >>>>>>>>>>>> - Maxime Beauchemin (binding) > >> >> > >> >>>>>>>>>>>> - Arthur Wiedmer (binding) > >> >> > >> >>>>>>>>>>>> - Dan Davydov (binding) > >> >> > >> >>>>>>>>>>>> - Jeremiah Lowin (binding) > >> >> > >> >>>>>>>>>>>> - Siddharth Anand (binding) > >> >> > >> >>>>>>>>>>>> - Alex van Boxel (binding) > >> >> > >> >>>>>>>>>>>> - Bolke de Bruin (binding) > >> >> > >> >>>>>>>>>>>> > >> >> > >> >>>>>>>>>>>> - Jayesh Senjaliya (non-binding) > >> >> > >> >>>>>>>>>>>> - Yi (non-binding) > >> >> > >> >>>>>>>>>>>> > >> >> > >> >>>>>>>>>>>> Vote thread (start): > >> >> > >> >>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- > >> >> > >> >>>>>>>>>>> airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188- > >> >> > >> >>>>>>>>>>> [email protected]%3e <http://mail-archives.apache. > >> >> > >> >>>>>>>>>>> org/mod_mbox/incubator-airflow-dev/201702.mbox/%3C7EB7B6D6- > >> >> > >> >>>>>>>>>> 092E-48D2-AA0F- > >> >> > >> >>>>>>>>>>> [email protected]%3E> > >> >> > >> >>>>>>>>>>>> > >> >> > >> >>>>>>>>>>>> Next steps: > >> >> > >> >>>>>>>>>>>> 1) will start the voting process at the IPMC mailinglist. I > >> do > >> >> > >> >>>>>> expect > >> >> > >> >>>>>>>>>>> some changes to be required mostly in documentation maybe a > >> >> > >> >>> license > >> >> > >> >>>>>>> here > >> >> > >> >>>>>>>>>>> and there. So, we might end up with changes to stable. As > long > >> >> as > >> >> > >> >>>>>>> these > >> >> > >> >>>>>>>>>> are > >> >> > >> >>>>>>>>>>> not (significant) code changes I will not re-raise the vote. > >> >> > >> >>>>>>>>>>>> 2) Only after the positive voting on the IPMC and > >> >> finalisation I > >> >> > >> >>>>>> will > >> >> > >> >>>>>>>>>>> rebrand the RC to Release. > >> >> > >> >>>>>>>>>>>> 3) I will upload it to the incubator release page, then the > >> >> tar > >> >> > >> >>>>>> ball > >> >> > >> >>>>>>>>>>> needs to propagate to the mirrors. > >> >> > >> >>>>>>>>>>>> 4) Update the website (can someone volunteer please?) > >> >> > >> >>>>>>>>>>>> 5) Finally, I will ask Maxime to upload it to pypi. It > seems > >> >> we > >> >> > >> >>>> can > >> >> > >> >>>>>>>>>> keep > >> >> > >> >>>>>>>>>>> the apache branding as lib cloud is doing this as well ( > >> >> > >> >>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package < > >> >> > >> >>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package>). > >> >> > >> >>>>>>>>>>>> > >> >> > >> >>>>>>>>>>>> Jippie! > >> >> > >> >>>>>>>>>>>> > >> >> > >> >>>>>>>>>>>> Bolke > >> >> > >> >>>>>>>>>>> > >> >> > >> >>>>>>>>>>> > >> >> > >> >>>>>>>>>> > >> >> > >> >>>>>>>>> > >> >> > >> >>>>>>>>> > >> >> > >> >>>>>>> > >> >> > >> >>>>>> > >> >> > >> >>>> > >> >> > >> >>> > >> >> > >> > >> > > >
