For sure Fokko! I'll go through the PRs after finishing reading the one for AIP-24.
AIP-17 does need quite some rewrites but I think we're pretty close. We plan to roll it out in our production cluster and then open source it after we believe it is stable. At the moment we're doing it by reusing task_instance table and we expect to see a big drop on the DB load as we believe that huge amount of heartbeat is the biggest contributor to DB load and connection issue. @yingbo.w...@airbnb.com <yingbo.w...@airbnb.com> can help to provide more details. Being able to reduce task start up overhead I think is great, especially for users of K8S executor but I guess it would not help too much on the sensor case since sensor tend to be relatively longer running tasks and don't get scheduled that often. I agree we should not wait for too long with 2.0, esp. those two items can be expand to large changes. As long as we acknowledge the importance of the two items and keep them under our radar I'm happy. Cheers, Kevin Y On Mon, Oct 21, 2019 at 7:34 AM James Meickle <jmeic...@quantopian.com.invalid> wrote: > I would feel better about a faster 2.0 release if we had a better plan for > how often we'll do future major version increments. As-is this might be the > first change to break backwards compat meaningfully in a while. > > On Mon, Oct 21, 2019 at 3:03 AM Driesprong, Fokko <fo...@driesprong.frl> > wrote: > > > Thanks Kevin, > > > > Kevin would love to have your input on this > > <https://github.com/apache/airflow/pull/6210> PR. This one tries to > > implement an async implementation of the operator, based on the sensor by > > Seelman. And also this <https://github.com/apache/airflow/pull/6370> > one, > > which is required to make it work. > > > > For me, the most important question is how we are going to batch these > poke > > operations in a way that doesn't add too much complexity. AIP-17 sounds > > like a great idea but requires a lot of rewriting and also adds another > > table on which we keep state (which also will add load to the DB). Also, > > Ash has some optimizations that reduce the overhead of starting a task, > > which might also partially mitigate the problem of the overhead when > > starting a task. > > > > Personally I feel that we should not wait too long with the 2.0 release, > > and not try to cram everything in there. Right now we're already > > backporting a lot to 1.10 and the resolving of the conflicts is getting > > more tedious. This already broke the 1.10.4 release. The master branch > > already has a lot of new stuff in there, that is just waiting to be > > released. > > > > Cheers, Fokko > > > > > > Op ma 21 okt. 2019 om 06:04 schreef Kevin Yang <yrql...@gmail.com>: > > > > > Thanks Ash for putting together the doc, somehow I cannot do anything > > on > > > confluence so I'll put my comments here. > > > > > > +1 for using this opportunity to define how we want to do releases, > e.g. > > > frequency, compatibility rules, etc. > > > > > > If the DAG isolation is being worked on I would love to see it in 2.0. > > > > > > Adding two other items I think are quite important: > > > > > > - DB reliability/performance > > > - DB is a single point of failure just as the scheduler and per > > > experience operating a huge cluster in Airbnb( 6k+ DAGs/60k+ > > > tasks), it is > > > a bigger treat on the stability of Airflow > > > - If the reason behind improving scheduler performance is > > > scalability then I think we can instead work on the DB, or > > something > > > like > > > AIP-17 > > > < > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-17+Airflow+sensor+optimization > > > > > > > - Project baseline > > > - As we grow more mature doing releases, we should consider > > establish > > > the baseline for Airflow and thus create easier upgrade > experience, > > > e.g. > > > performance benchmarking, defining API( not the web endpoint but > > API > > > like > > > how each operator params are used) and tests on them, etc. > > > - Not necessarily need to be fully included in 2.0 as I image > this > > > would be a long incremental work but the earlier we start the > > > earlier we > > > benefit > > > > > > > > > Cheers, > > > Kevin Y > > > > > > On Wed, Oct 9, 2019 at 7:00 PM Chao-Han Tsai <milton0...@gmail.com> > > wrote: > > > > > > > Although Airflow has the concept of task priority like Ash mentioned, > > it > > > > does not pre-empt running tasks. > > > > > > > > On Wed, Oct 9, 2019 at 12:42 AM Ash Berlin-Taylor <a...@apache.org> > > > wrote: > > > > > > > > > There's already a concept called priority_weight on tasks > > > > > > > > > http://airflow.apache.org/concepts.html?highlight=priority_weight#pools > > > > > (the doc about it is in relation to pools, but everything is run > in a > > > > pool > > > > > of "default_pool" if not specified.) > > > > > > > > > > Is that what you want? > > > > > > > > > > On 9 October 2019 07:38:38 BST, bharath palaksha < > > bharath...@gmail.com > > > > > > > > > wrote: > > > > > >Hi, > > > > > > > > > > > >Is there any discussion thread on adding priority to tasks and > > > > > >cost-based > > > > > >optimization? > > > > > >priority and pre-emption as an option to the user. If priority is > > > > > >specified, scheduler has to schedule high priority tasks and if > > > > > >pre-emption > > > > > >is true, it can pre-empt current running task which is of lower > > > > > >priority > > > > > > > > > > > >Thanks, > > > > > >Bharath > > > > > > > > > > > > > > > > > >On Mon, Sep 30, 2019 at 11:19 PM James Meickle > > > > > ><jmeic...@quantopian.com.invalid> wrote: > > > > > > > > > > > >> For what I'm looking for out of a 2.0, as an operator/cluster > > admin > > > > > >> (separate from what I'd like to see as a DAG developer), I'd > love > > to > > > > > >see: > > > > > >> > > > > > >> - Combine breaking changes into 2.0, and do as few as possible > > after > > > > > >> - A semver policy for 2.0 and onwards. (For instance we got bit > > hard > > > > > >by a > > > > > >> breaking API change in the k8s operator) > > > > > >> - Regularly scheduled releases (like: "minor every other month, > > > major > > > > > >every > > > > > >> other year") > > > > > >> - A security backport policy > > > > > >> - Pinned deps for releases > > > > > >> - A way to get integration/cloud vendor operator updates > > > out-of-tree, > > > > > >> without having to pull in unrelated Airflow updates > > > > > >> > > > > > >> For a lot of people, Airflow is an off-the-shelf app rather > than a > > > > > >library, > > > > > >> but we don't actually ship or support it anything like most > > > > > >comparable > > > > > >> off-the-shelf apps. It makes it much harder to support than > other > > > > > >> applications, unless you're a Python developer yourself. > > > > > >> > > > > > >> On Mon, Sep 30, 2019 at 11:18 AM Jarek Potiuk > > > > > ><jarek.pot...@polidea.com> > > > > > >> wrote: > > > > > >> > > > > > >> > All those are very important and we are going to work on some > of > > > > > >them as > > > > > >> > well. > > > > > >> > > > > > > >> > I think if there are breaking changes, we should rather try to > > fit > > > > > >them > > > > > >> in > > > > > >> > 2.0 release - at least to the point that they can be base for > > > > > >extending > > > > > >> it > > > > > >> > in later versions in backwards-compatible way (maybe then we > > > should > > > > > >adopt > > > > > >> > SemVer officially and follow it). > > > > > >> > > > > > > >> > J. > > > > > >> > > > > > > >> > > > > > > >> > On Tue, Sep 24, 2019 at 11:52 PM James Meickle > > > > > >> > <jmeic...@quantopian.com.invalid> wrote: > > > > > >> > > > > > > >> > > My question with that is, how often do we want to do major > > > > > >version > > > > > >> > > increments? There's a few API breaking changes I'd love to > > see, > > > > > >but > > > > > >> > > whether to propose them for 2.0 depends on what the wait > until > > > > > >3.0 > > > > > >> looks > > > > > >> > > like (or whether we'll allow more minor version breakages in > > the > > > > > >> future) > > > > > >> > > > > > > > >> > > On Tue, Sep 24, 2019, 11:44 Dan Davydov > > > > > ><ddavy...@twitter.com.invalid> > > > > > >> > > wrote: > > > > > >> > > > > > > > >> > > > I think along with "Improve Webserver Performance" we > should > > > > > >solve > > > > > >> the > > > > > >> > > > serialization and task execution isolation problems a > little > > > > > >bit more > > > > > >> > > > completely since I imagine there could be backwards > > > > > >compatibility > > > > > >> > issues. > > > > > >> > > > e.g. mapping each task JSON to a Docker image or other > > > > > >serialized > > > > > >> > > > representation that workers would then consume. See the > > > > > >attached PDF, > > > > > >> > > > AIP-24 is a subset of the DAG Definition Serialization > work, > > > > > >but in > > > > > >> my > > > > > >> > > > opinion we should still work on DAG Isolation too. My only > > > > > >concern is > > > > > >> > > that > > > > > >> > > > the scope is too big for 2.0. > > > > > >> > > > > > > > > >> > > > cc @Sumit Maheshwari <smaheshw...@twitter.com> who is > also > > > > > >looking > > > > > >> at > > > > > >> > > > tackling some of these problems. > > > > > >> > > > > > > > > >> > > > On Tue, Sep 24, 2019 at 9:47 AM Ash Berlin-Taylor > > > > > ><a...@apache.org> > > > > > >> > > wrote: > > > > > >> > > > > > > > > >> > > >> I'm also in favour of py-test (and it's what I use for > most > > > of > > > > > >my > > > > > >> > > >> development) which is why I created > > > > > >> > > >> https://issues.apache.org/jira/browse/AIRFLOW-4863, but > I > > > > > >don't > > > > > >> think > > > > > >> > > >> non-user-facing/impacting changes need to go on the road > > map. > > > > > >> > > >> > > > > > >> > > >> -ash > > > > > >> > > >> > > > > > >> > > >> > On 24 Sep 2019, at 13:53, Tomasz Urbaszek < > > > > > >> > > tomasz.urbas...@polidea.com> > > > > > >> > > >> wrote: > > > > > >> > > >> > > > > > > >> > > >> > I am thinking about proposing migration from nosetest > to > > > > > >pytest > > > > > >> > > because > > > > > >> > > >> > it's "more up to date". I have even a POC but a lot of > > test > > > > > >fails > > > > > >> > due > > > > > >> > > to > > > > > >> > > >> > probably side effects. > > > > > >> > > >> > > > > > > >> > > >> > Best, > > > > > >> > > >> > Tomek > > > > > >> > > >> > > > > > > >> > > >> > On Tue, Sep 24, 2019 at 2:38 PM Ash Berlin-Taylor > > > > > ><a...@apache.org > > > > > >> > > > > > > >> > > >> wrote: > > > > > >> > > >> > > > > > > >> > > >> >> That formatted very badly in plain text. The list was: > > > > > >> > > >> >> > > > > > >> > > >> >> • Knative Executor (AIP-25, currently draft. > > Being > > > > > >worked > > > > > >> on > > > > > >> > > by > > > > > >> > > >> >> Daniel Imberman ) > > > > > >> > > >> >> • Improve Webserver performance (AIP-24, > > currently > > > > > >draft. > > > > > >> > > Being > > > > > >> > > >> >> worked on by myself, Kaxil Naik and Zhou Fang) > > > > > >> > > >> >> • Enhanced real-time UI > > > > > >> > > >> >> • Improve Scheduler performance > > > > > >> > > >> >> • Extend/finish the API (AIP-13 is part of > this, > > > but > > > > > >not > > > > > >> > all) > > > > > >> > > >> >> • Production Docker image + Helm chart > > > > > >> > > >> >> > > > > > >> > > >> >>> On 24 Sep 2019, at 13:36, Ash Berlin-Taylor > > > > > ><a...@apache.org> > > > > > >> > wrote: > > > > > >> > > >> >>> > > > > > >> > > >> >>> Hi everyone, > > > > > >> > > >> >>> > > > > > >> > > >> >>> I'd like to start working on a concrete plan to get > > > > > >Airflow 2.0 > > > > > >> > out, > > > > > >> > > >> and > > > > > >> > > >> >> as a result I've started updating > > > > > >> > > >> >> > > > > > >https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+2.0 > > > > > >> > > >> >>> > > > > > >> > > >> >>> In addition to all the tidy up work ("spring > cleaning", > > > > > >finish > > > > > >> > tidy > > > > > >> > > up > > > > > >> > > >> >> after dropping Py2 etc) I'd propose the following 6 > high > > > > > >level > > > > > >> > items: > > > > > >> > > >> >>> > > > > > >> > > >> >>> Knative Executor (AIP-25, currently draft. Being > worked > > > on > > > > > >by > > > > > >> > Daniel > > > > > >> > > >> >> Imberman ) > > > > > >> > > >> >>> Improve Webserver performance (AIP-24, currently > draft. > > > > > >Being > > > > > >> > worked > > > > > >> > > >> on > > > > > >> > > >> >> by myself, Kaxil Naik and Zhou Fang) > > > > > >> > > >> >>> Enhanced real-time UI > > > > > >> > > >> >>> Improve Scheduler performance > > > > > >> > > >> >>> Extend/finish the API (AIP-13 is part of this, but > not > > > > > >all) > > > > > >> > > >> >>> Production Docker image + Helm chart > > > > > >> > > >> >>> We at Astronomer are committing to work on these in > > > > > >roughly this > > > > > >> > > order > > > > > >> > > >> >> if no one else gets to them first. I also propose that > > we > > > > > >create > > > > > >> > SIGs > > > > > >> > > >> >> (Special Interest Groups) in slack with > > weekly/fortnightly > > > > > >(every > > > > > >> > 14 > > > > > >> > > >> days) > > > > > >> > > >> >> "calls"/update sessions. We already have #sig-ui and > > > > > >> > > >> #sig-dag-serialisation. > > > > > >> > > >> >>> > > > > > >> > > >> >>> This roadmap is also not a promise that all of these > > will > > > > > >be > > > > > >> done > > > > > >> > > >> before > > > > > >> > > >> >> Airflow 2.0 - we may decide later to push something > back > > > to > > > > > >v2.1 > > > > > >> > etc. > > > > > >> > > >> >>> > > > > > >> > > >> >>> Does anyone disagree strongly with these priorities, > or > > > > > >have > > > > > >> > > anything > > > > > >> > > >> >> they want to add that you are willing to work on? > > > > > >> > > >> >>> > > > > > >> > > >> >>> Thanks, > > > > > >> > > >> >>> Ash > > > > > >> > > >> >> > > > > > >> > > >> >> > > > > > >> > > >> > > > > > > >> > > >> > -- > > > > > >> > > >> > > > > > > >> > > >> > Tomasz Urbaszek > > > > > >> > > >> > Polidea <https://www.polidea.com/> | Junior Software > > > > > >Engineer > > > > > >> > > >> > > > > > > >> > > >> > M: +48 505 628 493 <+48505628493> > > > > > >> > > >> > E: tomasz.urbas...@polidea.com > > > > > ><tomasz.urbasz...@polidea.com> > > > > > >> > > >> > > > > > > >> > > >> > Unique Tech > > > > > >> > > >> > Check out our projects! < > > https://www.polidea.com/our-work> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > -- > > > > > >> > > > > > > >> > Jarek Potiuk > > > > > >> > Polidea <https://www.polidea.com/> | Principal Software > > Engineer > > > > > >> > > > > > > >> > M: +48 660 796 129 <+48660796129> > > > > > >> > [image: Polidea] <https://www.polidea.com/> > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > -- > > > > > > > > Chao-Han Tsai > > > > > > > > > >