My 2c:

I observed both #1 and #2 in Dan's list. I figured y'all had had a
discussion about the change in behavior. :) In any case, I made my peace
with it, and we've been running happily in production for weeks now, so I
personally don't see it as a blocker. Obviously, if it's an issue for you
guys at AirBNB, a patch and merge to master is critical, but I still think
we should fix this stuff as part of 1.8.1.

One compelling counter argument to this is that there's a bit of whiplash
in terms of behavior, where 1.7.1.* behaves one way, then 1.8.0 behaves
another, then 1.8.1 goes back to the old way again. I guess I'm just not
that worried about it.

Anyway.. take it or leave it. :)

Cheers,
Chris

On Thu, Feb 23, 2017 at 12:31 PM, Bolke de Bruin <[email protected]> wrote:

> Gotcha. Will be patient. Good luck.
>
> Bolke
>
> > On 23 Feb 2017, at 21:12, Dan Davydov <[email protected]>
> wrote:
> >
> > Here is an example for 1, you can see that there are some white tasks
> that should have been run. I don't have time to create a skeleton DAG at
> the moment unfortunately because of release-related firefighting. Will
> hopefully post back here later once firefighting is done.
> >
> >
> > On Thu, Feb 23, 2017 at 12:00 PM, Bolke de Bruin <[email protected]
> <mailto:[email protected]>> wrote:
> > Hey Dan, Alex,
> >
> > Indeed #1 seems serious, specifically the the second part - skipping the
> root task (root task of the whole DAG?). Do you have a skeleton DAG that
> exposes the issue? Is there a root cause analysis? When was the issue
> introduced? On the the issue Alex mentioned, we don’t see that and I cannot
> really align the description of the issue with the PR yet, ie. I need
> clarification.
> >
> > Obviously, I’m not very happy if we indeed need to retract the release
> as we are ~12 hours away from closing of the vote at the IPMC mailinglist
> (strangely enough no one has voted yet). However, if it is that serious
> that it cannot wait for 1.8.1 then we need to do it. I would define
> “serious” as many people are going to be affected by it and they will not
> have a workaround available to them (ie. patching code or database), but
> the opinion of the community might differ.
> >
> > Cheers
> > Bolke
> >
> > P.S. I am also interested in #3, as it sounds like a integrity issue
> (which verify_integrity should catch) but also maybe too strong a
> assumption that such a task should exist (ie. a task was added to a Dag in
> a later stage).
> >
> >
> > > On 23 Feb 2017, at 20:15, Dan Davydov <[email protected] <mailto:
> [email protected]>.INVALID> wrote:
> > >
> > > Some more issues found by our users in addition to the one Alex
> reported
> > > and the UI issue when a dagrun doesn't have a start date:
> > > 1. If a task fails it fails the whole dagrun immediately fails, this
> is a
> > > very large change to how control flow works as the rest of the tasks
> in the
> > > DAG are not run (even e.g. leaf tasks). The same is true of the skipped
> > > status (if a leaf task is skipped then the root task for the DAG will
> get
> > > skipped and none of the other tasks in the DAG will run).
> > > 2. The black squares in the UI for tasks that aren't ready to run yet
> are
> > > confusing and make it hard for users to see which tasks haven't run yet
> > > (lower contrast). We should never initialize tasks in the DB that do
> not
> > > have a state (or at the least these should be white).
> > > 3. The Dagrun has a get_task_instance method that will fail if a dagrun
> > > doesn't have a copy of a task instance created which we have seen
> happen
> > > for some DAGs. This prevents those tasks from getting scheduled.
> > >
> > > I already patched 3 (and have a PR in flight for open source), and am
> > > working on a patch for 1 internally. 1 should be a blocker for
> releasing.
> > >
> > > On Wed, Feb 22, 2017 at 4:38 PM, Alex Guziel <[email protected]
> <mailto:[email protected]>.invalid
> > >> wrote:
> > >
> > >> I have some concern that this change
> > >> https://github.com/apache/incubator-airflow/pull/1939 <
> https://github.com/apache/incubator-airflow/pull/1939>
> > >> [AIRFLOW-679] may be having issues because we are seeing lots of
> double
> > >> triggers
> > >> of tasks and tasks being killed as a result.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Wed, Feb 22, 2017 4:35 PM, Dan Davydov
> [email protected]
> > >> wrote:
> > >> Bumping the thread so another user can comment.
> > >>
> > >>
> > >>
> > >>
> > >> On Wed, Feb 22, 2017 at 3:12 PM, Maxime Beauchemin <
> > >>
> > >> [email protected] <mailto:[email protected]>>
> wrote:
> > >>
> > >>
> > >>
> > >>
> > >>> What I meant to ask is "how much engineering effort it takes to bake
> a
> > >>
> > >>> single RC?", I guess it depends on how much git-fu is necessary plus
> some
> > >>
> > >>> overhead cost of doing the series of actions/commands/emails/jira.
> > >>
> > >>>
> > >>
> > >>> I can volunteer for 1.8.1 (hopefully I can get do it along another
> Airbnb
> > >>
> > >>> engineer/volunteer to tag along) and will try to document/automate
> > >>
> > >>> everything I can as I go through the process. The goal of 1.8.1
> could be
> > >> to
> > >>
> > >>> basically package 1.8.0 + Dan's bugfix, and for Airbnb to get
> familiar
> > >> with
> > >>
> > >>> the process.
> > >>
> > >>>
> > >>
> > >>> It'd be great if you can dump your whole process on the wiki, and
> we'll
> > >>
> > >>> improve it on this next pass.
> > >>
> > >>>
> > >>
> > >>> Thanks again for the mountain of work that went into packaging this
> > >>
> > >>> release.
> > >>
> > >>>
> > >>
> > >>> Max
> > >>
> > >>>
> > >>
> > >>> On Wed, Feb 22, 2017 at 2:44 PM, Bolke de Bruin <[email protected]
> <mailto:[email protected]>>
> > >> wrote:
> > >>
> > >>>
> > >>
> > >>>> I thought you volunteered to baby sit 1.8.1 Chris ;-)?
> > >>
> > >>>>
> > >>
> > >>>> Sent from my iPhone
> > >>
> > >>>>
> > >>
> > >>>>> On 22 Feb 2017, at 23:31, Chris Riccomini <[email protected]
> <mailto:[email protected]>>
> > >>
> > >>> wrote:
> > >>
> > >>>>>
> > >>
> > >>>>> I'm +1 for doing a 1.8.1 fast follow-on
> > >>
> > >>>>>
> > >>
> > >>>>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin <
> > >>
> > >>>>> [email protected] <mailto:[email protected]>>
> wrote:
> > >>
> > >>>>>
> > >>
> > >>>>>> Our database may have edge cases that could be associated with
> > >> running
> > >>
> > >>>> any
> > >>
> > >>>>>> previous version that may or may not have been part of an official
> > >>
> > >>>> release.
> > >>
> > >>>>>>
> > >>
> > >>>>>> Let's see if anyone else reports the issue. If no one does, one
> > >> option
> > >>
> > >>>> is
> > >>
> > >>>>>> to release 1.8.0 as is with a comment in the release notes, and
> > >> have a
> > >>
> > >>>>>> future official minor apache release 1.8.1 that would fix these
> > >> minor
> > >>
> > >>>>>> issues that are not deal breaker.
> > >>
> > >>>>>>
> > >>
> > >>>>>> @bolke, I'm curious, how long does it take you to go through one
> > >>
> > >>> release
> > >>
> > >>>>>> cycle? Oh, and do you have a documented step by step process for
> > >>
> > >>>> releasing?
> > >>
> > >>>>>> I'd like to add the Pypi part to this doc and add committers that
> > >> are
> > >>
> > >>>>>> interested to have rights on the project on Pypi.
> > >>
> > >>>>>>
> > >>
> > >>>>>> Max
> > >>
> > >>>>>>
> > >>
> > >>>>>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin <
> [email protected] <mailto:[email protected]>
> > >>>
> > >>
> > >>>> wrote:
> > >>
> > >>>>>>>
> > >>
> > >>>>>>> So it is a database integrity issue? Afaik a start_date should
> > >> always
> > >>
> > >>>> be
> > >>
> > >>>>>>> set for a DagRun (create_dagrun) does so I didn't check the code
> > >>
> > >>>> though.
> > >>
> > >>>>>>>
> > >>
> > >>>>>>> Sent from my iPhone
> > >>
> > >>>>>>>
> > >>
> > >>>>>>>> On 22 Feb 2017, at 22:19, Dan Davydov <[email protected]
> <mailto:[email protected]>.
> > >>
> > >>>> INVALID>
> > >>
> > >>>>>>> wrote:
> > >>
> > >>>>>>>>
> > >>
> > >>>>>>>> Should clarify this occurs when a dagrun does not have a start
> > >> date,
> > >>
> > >>>>>> not
> > >>
> > >>>>>>> a
> > >>
> > >>>>>>>> dag (which makes it even less likely to happen). I don't think
> > >> this
> > >>
> > >>> is
> > >>
> > >>>>>> a
> > >>
> > >>>>>>>> blocker for releasing.
> > >>
> > >>>>>>>>
> > >>
> > >>>>>>>>> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov <
> > >>
> > >>> [email protected] <mailto:[email protected]>
> > >>
> > >>>>>
> > >>
> > >>>>>>> wrote:
> > >>
> > >>>>>>>>>
> > >>
> > >>>>>>>>> I rolled this out in our prod and the webservers failed to load
> > >> due
> > >>
> > >>>> to
> > >>
> > >>>>>>>>> this commit:
> > >>
> > >>>>>>>>>
> > >>
> > >>>>>>>>> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag
> > >>
> > >>>>>>>>> 7c94d81c390881643f94d5e3d7d6fb351a445b72
> > >>
> > >>>>>>>>>
> > >>
> > >>>>>>>>> This fixed it:
> > >>
> > >>>>>>>>> - </a> <span id="statuses_info"
> > >>
> > >>>>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true"
> > >>
> > >>> title="Start
> > >>
> > >>>>>>> Date:
> > >>
> > >>>>>>>>> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}"></span>
> > >>
> > >>>>>>>>> + </a> <span id="statuses_info"
> > >>
> > >>>>>>>>> class="glyphicon glyphicon-info-sign"
> aria-hidden="true"></span>
> > >>
> > >>>>>>>>>
> > >>
> > >>>>>>>>> This is caused by assuming that all DAGs have start dates set,
> > >> so a
> > >>
> > >>>>>>> broken
> > >>
> > >>>>>>>>> DAG will take down the whole UI. Not sure if we want to make
> > >> this a
> > >>
> > >>>>>>> blocker
> > >>
> > >>>>>>>>> for the release or not, I'm guessing for most deployments this
> > >>
> > >>> would
> > >>
> > >>>>>>> occur
> > >>
> > >>>>>>>>> pretty rarely. I'll submit a PR to fix it soon.
> > >>
> > >>>>>>>>>
> > >>
> > >>>>>>>>>
> > >>
> > >>>>>>>>>
> > >>
> > >>>>>>>>> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini <
> > >>
> > >>>>>> [email protected] <mailto:[email protected]>
> > >>
> > >>>>>>>>
> > >>
> > >>>>>>>>> wrote:
> > >>
> > >>>>>>>>>
> > >>
> > >>>>>>>>>> Ack that the vote has already passed, but belated +1 (binding)
> > >>
> > >>>>>>>>>>
> > >>
> > >>>>>>>>>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin <
> > >>
> > >>> [email protected] <mailto:[email protected]>>
> > >>
> > >>>>>>>>>> wrote:
> > >>
> > >>>>>>>>>>
> > >>
> > >>>>>>>>>>> IPMC Voting can be found here:
> > >>
> > >>>>>>>>>>>
> > >>
> > >>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/
> <http://mail-archives.apache.org/mod_mbox/incubator-general/>
> > >>
> > >>>>>>>>>> 201702.mbox/%
> > >>
> > >>>>>>>>>>> [email protected] <mailto:
> [email protected]>%3e <
> > >>
> > >>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/
> <http://mail-archives.apache.org/mod_mbox/incubator-general/>
> > >>
> > >>>>>>>>>> 201702.mbox/%
> > >>
> > >>>>>>>>>>> [email protected] <mailto:
> [email protected]>%3E>
> > >>
> > >>>>>>>>>>>
> > >>
> > >>>>>>>>>>> Kind regards,
> > >>
> > >>>>>>>>>>> Bolke
> > >>
> > >>>>>>>>>>>
> > >>
> > >>>>>>>>>>>> On 21 Feb 2017, at 08:20, Bolke de Bruin <[email protected]
> <mailto:[email protected]>>
> > >>
> > >>>>>> wrote:
> > >>
> > >>>>>>>>>>>>
> > >>
> > >>>>>>>>>>>> Hello,
> > >>
> > >>>>>>>>>>>>
> > >>
> > >>>>>>>>>>>> Apache Airflow (incubating) 1.8.0 (based on RC4) has been
> > >>
> > >>>> accepted.
> > >>
> > >>>>>>>>>>>>
> > >>
> > >>>>>>>>>>>> 9 “+1” votes received:
> > >>
> > >>>>>>>>>>>>
> > >>
> > >>>>>>>>>>>> - Maxime Beauchemin (binding)
> > >>
> > >>>>>>>>>>>> - Arthur Wiedmer (binding)
> > >>
> > >>>>>>>>>>>> - Dan Davydov (binding)
> > >>
> > >>>>>>>>>>>> - Jeremiah Lowin (binding)
> > >>
> > >>>>>>>>>>>> - Siddharth Anand (binding)
> > >>
> > >>>>>>>>>>>> - Alex van Boxel (binding)
> > >>
> > >>>>>>>>>>>> - Bolke de Bruin (binding)
> > >>
> > >>>>>>>>>>>>
> > >>
> > >>>>>>>>>>>> - Jayesh Senjaliya (non-binding)
> > >>
> > >>>>>>>>>>>> - Yi (non-binding)
> > >>
> > >>>>>>>>>>>>
> > >>
> > >>>>>>>>>>>> Vote thread (start):
> > >>
> > >>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- <
> http://mail-archives.apache.org/mod_mbox/incubator->
> > >>
> > >>>>>>>>>>> airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188-
> > >>
> > >>>>>>>>>>> [email protected] <mailto:[email protected]>%3e <
> http://mail-archives.apache <http://mail-archives.apache/>.
> > >>
> > >>>>>>>>>>> org/mod_mbox/incubator-airflow-dev/201702.mbox/%3C7EB7B6D6-
> > >>
> > >>>>>>>>>> 092E-48D2-AA0F-
> > >>
> > >>>>>>>>>>> [email protected] <mailto:[email protected]>%3E>
> > >>
> > >>>>>>>>>>>>
> > >>
> > >>>>>>>>>>>> Next steps:
> > >>
> > >>>>>>>>>>>> 1) will start the voting process at the IPMC mailinglist. I
> do
> > >>
> > >>>>>> expect
> > >>
> > >>>>>>>>>>> some changes to be required mostly in documentation maybe a
> > >>
> > >>> license
> > >>
> > >>>>>>> here
> > >>
> > >>>>>>>>>>> and there. So, we might end up with changes to stable. As
> long
> > >> as
> > >>
> > >>>>>>> these
> > >>
> > >>>>>>>>>> are
> > >>
> > >>>>>>>>>>> not (significant) code changes I will not re-raise the vote.
> > >>
> > >>>>>>>>>>>> 2) Only after the positive voting on the IPMC and
> > >> finalisation I
> > >>
> > >>>>>> will
> > >>
> > >>>>>>>>>>> rebrand the RC to Release.
> > >>
> > >>>>>>>>>>>> 3) I will upload it to the incubator release page, then the
> > >> tar
> > >>
> > >>>>>> ball
> > >>
> > >>>>>>>>>>> needs to propagate to the mirrors.
> > >>
> > >>>>>>>>>>>> 4) Update the website (can someone volunteer please?)
> > >>
> > >>>>>>>>>>>> 5) Finally, I will ask Maxime to upload it to pypi. It seems
> > >> we
> > >>
> > >>>> can
> > >>
> > >>>>>>>>>> keep
> > >>
> > >>>>>>>>>>> the apache branding as lib cloud is doing this as well (
> > >>
> > >>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package <
> https://libcloud.apache.org/downloads.html#pypi-package> <
> > >>
> > >>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package <
> https://libcloud.apache.org/downloads.html#pypi-package>>).
> > >>
> > >>>>>>>>>>>>
> > >>
> > >>>>>>>>>>>> Jippie!
> > >>
> > >>>>>>>>>>>>
> > >>
> > >>>>>>>>>>>> Bolke
> > >>
> > >>>>>>>>>>>
> > >>
> > >>>>>>>>>>>
> > >>
> > >>>>>>>>>>
> > >>
> > >>>>>>>>>
> > >>
> > >>>>>>>>>
> > >>
> > >>>>>>>
> > >>
> > >>>>>>
> > >>
> > >>>>
> > >>
> > >>>
> > >>
> >
> >
>
>

Reply via email to