I agree with disabling building broken DAGs and also disabling / not supporting orphaned Operators/TaskInstances. It creates so many issues down the line and the fixes are relatively easy. It will make airflow and the used DAGs more maintainable.
So +1, just make sure it is well documented in UPDATING.md my 2 cents. Bolke > Op 6 mei 2016, om 07:22 heeft Jeremiah Lowin <[email protected]> het volgende > geschreven: > > Tonight I was working with Dan on fixing the speed regression with large > DAGs: https://github.com/apache/incubator-airflow/pull/1470. That clears > the first blocker for 1.7.1 as described in AIRFLOW-52. > > We wanted to ask for the group's thoughts on the second blocker. Basically, > the issue centers on this pattern: > > ```python > # email is an operator WITHOUT a DAG. > email = Operator(...) > > # create dependencies for email > email.set_upstream(dag.roots) > > # add email to the DAG > dag.add_task(email) > ``` > > Why is this a problem? Under Airflow 1.7.0, this DAG is actually completely > broken after the set_upstream command, because it has a dependency to a > task that's not in the DAG. It can't be run and will even raise an > exception if you do something simple like access dag.roots. HOWEVER, this > building this broken DAG is allowed in 1.7.0 and the user cures it in the > last line by explicitly adding the task. > > In https://github.com/apache/incubator-airflow/pull/1318, which will be > part of 1.7.1, I took steps that prevent users from creating broken DAGs at > all. The relevant fix in this case is that the email task would be > automatically added to the DAG (it would infer its membership from the > tasks in dag.roots). However, once that inference is made, the last line > becomes illegal, since you can't add a task to a DAG it's already in. > > So here's the thing: because the last line becomes illegal, this code > snippet will no longer run under 1.7.1. My understanding is that it is > being used in production at Airbnb, so I wanted to raise the issue to see > if we can get comfortable with the change. > > My opinion is that being able to build a broken DAG is *always* a bug, and > so the issue should be fixed even if that creates some incompatibilities > for anyone exploiting it. Particularly in this case, where the remedy is > simply to delete the last line. > > We thought about deprecating the behavior, but I don't see how we can > because 1) we don't know for sure that the user is trying to do something > illegal at the time of the set_upstream call, and 2) the guard against > double-adding a task to a DAG has been in Airflow for a very long time, so > reverting it would constitute a really massive behavior change. > > So my vote is to proceed with the fix, but as it could potentially > inconvenience the hand that feeds (and by "feeds" I mean "gave us Airflow") > I'd like to be sensitive to their needs. > > J > > On Thu, May 5, 2016 at 2:09 PM Dan Davydov <[email protected]> > wrote: > >> Moved discussion to https://issues.apache.org/jira/browse/AIRFLOW-52 and >> updated the status of the task there. >> >> On Tue, May 3, 2016 at 2:32 AM, Dan Davydov <[email protected]> >> wrote: >> >>> It's per DAG unfortunately (we have some pretty funky DAGs here). >>> On May 2, 2016 10:26 PM, "Bolke de Bruin" <[email protected]> wrote: >>> >>>> Hi dan >>>> >>>> Is that per dag or per dag bag? Multiprocessing should parallelize dag >>>> parsing so I am very curious. Let me know if I can help out. >>>> Bolke >>>> >>>> Sent from my iPhone >>>> >>>>> On 3 mei 2016, at 01:47, Dan Davydov <[email protected]> >>>> wrote: >>>>> >>>>> So a quick update, unfortunately we saw some DAGBag parsing time >>>> increases >>>>> (~10x for some DAGs) on the webservers with the 1.7.1rc3. Because of >>>> this I >>>>> will be working on a staging cluster that has a copy of our production >>>>> production DAGBag, and is a copy of our production airflow >>>> infrastructure, >>>>> just without the workers. This will let us debug the release outside >> of >>>>> production. >>>>> >>>>> On Thu, Apr 28, 2016 at 10:20 AM, Dan Davydov <[email protected] >>> >>>>> wrote: >>>>> >>>>>> Definitely, here were the issues we hit: >>>>>> - airbnb/airflow#1365 occured >>>>>> - Webservers/scheduler were timing out and stuck in restart cycles >> due >>>> to >>>>>> increased time spent on parsing DAGs due to airbnb/airflow#1213/files >>>>>> - Failed tasks that ran after the upgrade and the revert (after we >>>>>> reverted the upgrade) were unable to be cleared (but running the >> tasks >>>>>> through the UI worked without clearing them) >>>>>> - The way log files were stored on S3 was changed (airflow now >>>> requires a >>>>>> connection to be setup) which broke log storage >>>>>> - Some DAGs were broken (unable to be parsed) due to package >>>>>> reorganization in open-source (the import paths were changed) (the >>>> utils >>>>>> refactor commit) >>>>>> >>>>>> On Thu, Apr 28, 2016 at 12:17 AM, Bolke de Bruin <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Dan, >>>>>>> >>>>>>> Are you able to share some of the bugs you have been hitting and >>>>>>> connected commits? >>>>>>> >>>>>>> We could at the very least learn from them and maybe even improve >>>> testing. >>>>>>> >>>>>>> Bolke >>>>>>> >>>>>>> >>>>>>>>> Op 28 apr. 2016, om 06:51 heeft Dan Davydov >>>>>>>> <[email protected]> het volgende geschreven: >>>>>>>> >>>>>>>> All of the blockers were fixed as of yesterday (there was some >> issue >>>>>>> that >>>>>>>> Jeremiah was looking at with the last release candidate which I >>>> think is >>>>>>>> fixed but I'm not sure). I started staging the airbnb_1.7.1rc3 tag >>>>>>> earlier >>>>>>>> today, so as long as metrics look OK and the 1.7.1rc2 issues seem >>>>>>> resolved >>>>>>>> tomorrow I will release internally either tomorrow or Monday (we >> try >>>> to >>>>>>>> avoid releases on Friday). If there aren't any issues we can push >> the >>>>>>> 1.7.1 >>>>>>>> tag on Monday/Tuesday. >>>>>>>> >>>>>>>> @Sid >>>>>>>> I think we were originally aiming to deploy internally once every >> two >>>>>>> weeks >>>>>>>> but we decided to do it once a month in the end. I'm not too sure >>>> about >>>>>>>> that so Max can comment there. >>>>>>>> >>>>>>>> We have been running 1.7.0 in production for about a month now and >> it >>>>>>>> stable. >>>>>>>> >>>>>>>> I think what really slowed down this release cycle is some commits >>>> that >>>>>>>> caused severe bugs that we decided to roll-forward with instead of >>>>>>> rolling >>>>>>>> back. We can potentially try reverting these commits next time >> while >>>> the >>>>>>>> fixes are applied for the next version, although this is not always >>>>>>> trivial >>>>>>>> to do. >>>>>>>> >>>>>>>> On Wed, Apr 27, 2016 at 9:31 PM, Siddharth Anand < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Btw, is anyone of the committers running 1.7.0 or later in any >>>> staging >>>>>>> or >>>>>>>>> production env? I have to say that given that 1.6.2 was the most >>>> stable >>>>>>>>> release and is 4 or more months old does not say much for our >>>> release >>>>>>>>> cadence or process. What's our plan for 1.7.1? >>>>>>>>> >>>>>>>>> Sent from Sid's iPhone >>>>>>>>> >>>>>>>>>>> On Apr 27, 2016, at 9:05 PM, Chris Riccomini < >>>> [email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hey all, >>>>>>>>>> >>>>>>>>>> I just wanted to check in on the 1.7.1 release status. I know >> there >>>>>>> have >>>>>>>>>> been some major-ish bugs, as well as several people doing tests. >>>>>>> Should >>>>>>>>> we >>>>>>>>>> create a 1.7.1 release JIRA, and track outstanding issues there? >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Chris >>>>>> >>>> >>> >>
