Removing overhead for starting the processes would not only benefit the k8s executor, but also the workers spawn subprocesses.
I would definitely be interested to see some numbers on the improvement of AIP-17 in practice. Maybe we should build some benchmark to see if we introduced performance regression in the current master. Maybe we can do something similar to the Apache Spark project, and create a preview release for Airflow 2.0. Cheers, Fokko Op di 22 okt. 2019 om 04:58 schreef Kevin Yang <yrql...@gmail.com>: > For sure Fokko! I'll go through the PRs after finishing reading the one for > AIP-24. > > AIP-17 does need quite some rewrites but I think we're pretty close. We > plan to roll it out in our production cluster and then open source it after > we believe it is stable. At the moment we're doing it by reusing > task_instance table and we expect to see a big drop on the DB load as we > believe that huge amount of heartbeat is the biggest contributor to DB load > and connection issue. @yingbo.w...@airbnb.com <yingbo.w...@airbnb.com> can > help to provide more details. > > Being able to reduce task start up overhead I think is great, especially > for users of K8S executor but I guess it would not help too much on the > sensor case since sensor tend to be relatively longer running tasks and > don't get scheduled that often. > > I agree we should not wait for too long with 2.0, esp. those two items can > be expand to large changes. As long as we acknowledge the importance of the > two items and keep them under our radar I'm happy. > > > Cheers, > Kevin Y > > On Mon, Oct 21, 2019 at 7:34 AM James Meickle > <jmeic...@quantopian.com.invalid> wrote: > > > I would feel better about a faster 2.0 release if we had a better plan > for > > how often we'll do future major version increments. As-is this might be > the > > first change to break backwards compat meaningfully in a while. > > > > On Mon, Oct 21, 2019 at 3:03 AM Driesprong, Fokko <fo...@driesprong.frl> > > wrote: > > > > > Thanks Kevin, > > > > > > Kevin would love to have your input on this > > > <https://github.com/apache/airflow/pull/6210> PR. This one tries to > > > implement an async implementation of the operator, based on the sensor > by > > > Seelman. And also this <https://github.com/apache/airflow/pull/6370> > > one, > > > which is required to make it work. > > > > > > For me, the most important question is how we are going to batch these > > poke > > > operations in a way that doesn't add too much complexity. AIP-17 sounds > > > like a great idea but requires a lot of rewriting and also adds another > > > table on which we keep state (which also will add load to the DB). > Also, > > > Ash has some optimizations that reduce the overhead of starting a task, > > > which might also partially mitigate the problem of the overhead when > > > starting a task. > > > > > > Personally I feel that we should not wait too long with the 2.0 > release, > > > and not try to cram everything in there. Right now we're already > > > backporting a lot to 1.10 and the resolving of the conflicts is getting > > > more tedious. This already broke the 1.10.4 release. The master branch > > > already has a lot of new stuff in there, that is just waiting to be > > > released. > > > > > > Cheers, Fokko > > > > > > > > > Op ma 21 okt. 2019 om 06:04 schreef Kevin Yang <yrql...@gmail.com>: > > > > > > > Thanks Ash for putting together the doc, somehow I cannot do > anything > > > on > > > > confluence so I'll put my comments here. > > > > > > > > +1 for using this opportunity to define how we want to do releases, > > e.g. > > > > frequency, compatibility rules, etc. > > > > > > > > If the DAG isolation is being worked on I would love to see it in > 2.0. > > > > > > > > Adding two other items I think are quite important: > > > > > > > > - DB reliability/performance > > > > - DB is a single point of failure just as the scheduler and per > > > > experience operating a huge cluster in Airbnb( 6k+ DAGs/60k+ > > > > tasks), it is > > > > a bigger treat on the stability of Airflow > > > > - If the reason behind improving scheduler performance is > > > > scalability then I think we can instead work on the DB, or > > > something > > > > like > > > > AIP-17 > > > > < > > > > > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-17+Airflow+sensor+optimization > > > > > > > > > - Project baseline > > > > - As we grow more mature doing releases, we should consider > > > establish > > > > the baseline for Airflow and thus create easier upgrade > > experience, > > > > e.g. > > > > performance benchmarking, defining API( not the web endpoint > but > > > API > > > > like > > > > how each operator params are used) and tests on them, etc. > > > > - Not necessarily need to be fully included in 2.0 as I image > > this > > > > would be a long incremental work but the earlier we start the > > > > earlier we > > > > benefit > > > > > > > > > > > > Cheers, > > > > Kevin Y > > > > > > > > On Wed, Oct 9, 2019 at 7:00 PM Chao-Han Tsai <milton0...@gmail.com> > > > wrote: > > > > > > > > > Although Airflow has the concept of task priority like Ash > mentioned, > > > it > > > > > does not pre-empt running tasks. > > > > > > > > > > On Wed, Oct 9, 2019 at 12:42 AM Ash Berlin-Taylor <a...@apache.org> > > > > wrote: > > > > > > > > > > > There's already a concept called priority_weight on tasks > > > > > > > > > > > > http://airflow.apache.org/concepts.html?highlight=priority_weight#pools > > > > > > (the doc about it is in relation to pools, but everything is run > > in a > > > > > pool > > > > > > of "default_pool" if not specified.) > > > > > > > > > > > > Is that what you want? > > > > > > > > > > > > On 9 October 2019 07:38:38 BST, bharath palaksha < > > > bharath...@gmail.com > > > > > > > > > > > wrote: > > > > > > >Hi, > > > > > > > > > > > > > >Is there any discussion thread on adding priority to tasks and > > > > > > >cost-based > > > > > > >optimization? > > > > > > >priority and pre-emption as an option to the user. If priority > is > > > > > > >specified, scheduler has to schedule high priority tasks and if > > > > > > >pre-emption > > > > > > >is true, it can pre-empt current running task which is of lower > > > > > > >priority > > > > > > > > > > > > > >Thanks, > > > > > > >Bharath > > > > > > > > > > > > > > > > > > > > >On Mon, Sep 30, 2019 at 11:19 PM James Meickle > > > > > > ><jmeic...@quantopian.com.invalid> wrote: > > > > > > > > > > > > > >> For what I'm looking for out of a 2.0, as an operator/cluster > > > admin > > > > > > >> (separate from what I'd like to see as a DAG developer), I'd > > love > > > to > > > > > > >see: > > > > > > >> > > > > > > >> - Combine breaking changes into 2.0, and do as few as possible > > > after > > > > > > >> - A semver policy for 2.0 and onwards. (For instance we got > bit > > > hard > > > > > > >by a > > > > > > >> breaking API change in the k8s operator) > > > > > > >> - Regularly scheduled releases (like: "minor every other > month, > > > > major > > > > > > >every > > > > > > >> other year") > > > > > > >> - A security backport policy > > > > > > >> - Pinned deps for releases > > > > > > >> - A way to get integration/cloud vendor operator updates > > > > out-of-tree, > > > > > > >> without having to pull in unrelated Airflow updates > > > > > > >> > > > > > > >> For a lot of people, Airflow is an off-the-shelf app rather > > than a > > > > > > >library, > > > > > > >> but we don't actually ship or support it anything like most > > > > > > >comparable > > > > > > >> off-the-shelf apps. It makes it much harder to support than > > other > > > > > > >> applications, unless you're a Python developer yourself. > > > > > > >> > > > > > > >> On Mon, Sep 30, 2019 at 11:18 AM Jarek Potiuk > > > > > > ><jarek.pot...@polidea.com> > > > > > > >> wrote: > > > > > > >> > > > > > > >> > All those are very important and we are going to work on > some > > of > > > > > > >them as > > > > > > >> > well. > > > > > > >> > > > > > > > >> > I think if there are breaking changes, we should rather try > to > > > fit > > > > > > >them > > > > > > >> in > > > > > > >> > 2.0 release - at least to the point that they can be base > for > > > > > > >extending > > > > > > >> it > > > > > > >> > in later versions in backwards-compatible way (maybe then we > > > > should > > > > > > >adopt > > > > > > >> > SemVer officially and follow it). > > > > > > >> > > > > > > > >> > J. > > > > > > >> > > > > > > > >> > > > > > > > >> > On Tue, Sep 24, 2019 at 11:52 PM James Meickle > > > > > > >> > <jmeic...@quantopian.com.invalid> wrote: > > > > > > >> > > > > > > > >> > > My question with that is, how often do we want to do major > > > > > > >version > > > > > > >> > > increments? There's a few API breaking changes I'd love > to > > > see, > > > > > > >but > > > > > > >> > > whether to propose them for 2.0 depends on what the wait > > until > > > > > > >3.0 > > > > > > >> looks > > > > > > >> > > like (or whether we'll allow more minor version breakages > in > > > the > > > > > > >> future) > > > > > > >> > > > > > > > > >> > > On Tue, Sep 24, 2019, 11:44 Dan Davydov > > > > > > ><ddavy...@twitter.com.invalid> > > > > > > >> > > wrote: > > > > > > >> > > > > > > > > >> > > > I think along with "Improve Webserver Performance" we > > should > > > > > > >solve > > > > > > >> the > > > > > > >> > > > serialization and task execution isolation problems a > > little > > > > > > >bit more > > > > > > >> > > > completely since I imagine there could be backwards > > > > > > >compatibility > > > > > > >> > issues. > > > > > > >> > > > e.g. mapping each task JSON to a Docker image or other > > > > > > >serialized > > > > > > >> > > > representation that workers would then consume. See the > > > > > > >attached PDF, > > > > > > >> > > > AIP-24 is a subset of the DAG Definition Serialization > > work, > > > > > > >but in > > > > > > >> my > > > > > > >> > > > opinion we should still work on DAG Isolation too. My > only > > > > > > >concern is > > > > > > >> > > that > > > > > > >> > > > the scope is too big for 2.0. > > > > > > >> > > > > > > > > > >> > > > cc @Sumit Maheshwari <smaheshw...@twitter.com> who is > > also > > > > > > >looking > > > > > > >> at > > > > > > >> > > > tackling some of these problems. > > > > > > >> > > > > > > > > > >> > > > On Tue, Sep 24, 2019 at 9:47 AM Ash Berlin-Taylor > > > > > > ><a...@apache.org> > > > > > > >> > > wrote: > > > > > > >> > > > > > > > > > >> > > >> I'm also in favour of py-test (and it's what I use for > > most > > > > of > > > > > > >my > > > > > > >> > > >> development) which is why I created > > > > > > >> > > >> https://issues.apache.org/jira/browse/AIRFLOW-4863, > but > > I > > > > > > >don't > > > > > > >> think > > > > > > >> > > >> non-user-facing/impacting changes need to go on the > road > > > map. > > > > > > >> > > >> > > > > > > >> > > >> -ash > > > > > > >> > > >> > > > > > > >> > > >> > On 24 Sep 2019, at 13:53, Tomasz Urbaszek < > > > > > > >> > > tomasz.urbas...@polidea.com> > > > > > > >> > > >> wrote: > > > > > > >> > > >> > > > > > > > >> > > >> > I am thinking about proposing migration from nosetest > > to > > > > > > >pytest > > > > > > >> > > because > > > > > > >> > > >> > it's "more up to date". I have even a POC but a lot > of > > > test > > > > > > >fails > > > > > > >> > due > > > > > > >> > > to > > > > > > >> > > >> > probably side effects. > > > > > > >> > > >> > > > > > > > >> > > >> > Best, > > > > > > >> > > >> > Tomek > > > > > > >> > > >> > > > > > > > >> > > >> > On Tue, Sep 24, 2019 at 2:38 PM Ash Berlin-Taylor > > > > > > ><a...@apache.org > > > > > > >> > > > > > > > >> > > >> wrote: > > > > > > >> > > >> > > > > > > > >> > > >> >> That formatted very badly in plain text. The list > was: > > > > > > >> > > >> >> > > > > > > >> > > >> >> • Knative Executor (AIP-25, currently draft. > > > Being > > > > > > >worked > > > > > > >> on > > > > > > >> > > by > > > > > > >> > > >> >> Daniel Imberman ) > > > > > > >> > > >> >> • Improve Webserver performance (AIP-24, > > > currently > > > > > > >draft. > > > > > > >> > > Being > > > > > > >> > > >> >> worked on by myself, Kaxil Naik and Zhou Fang) > > > > > > >> > > >> >> • Enhanced real-time UI > > > > > > >> > > >> >> • Improve Scheduler performance > > > > > > >> > > >> >> • Extend/finish the API (AIP-13 is part of > > this, > > > > but > > > > > > >not > > > > > > >> > all) > > > > > > >> > > >> >> • Production Docker image + Helm chart > > > > > > >> > > >> >> > > > > > > >> > > >> >>> On 24 Sep 2019, at 13:36, Ash Berlin-Taylor > > > > > > ><a...@apache.org> > > > > > > >> > wrote: > > > > > > >> > > >> >>> > > > > > > >> > > >> >>> Hi everyone, > > > > > > >> > > >> >>> > > > > > > >> > > >> >>> I'd like to start working on a concrete plan to get > > > > > > >Airflow 2.0 > > > > > > >> > out, > > > > > > >> > > >> and > > > > > > >> > > >> >> as a result I've started updating > > > > > > >> > > >> >> > > > > > > >https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+2.0 > > > > > > >> > > >> >>> > > > > > > >> > > >> >>> In addition to all the tidy up work ("spring > > cleaning", > > > > > > >finish > > > > > > >> > tidy > > > > > > >> > > up > > > > > > >> > > >> >> after dropping Py2 etc) I'd propose the following 6 > > high > > > > > > >level > > > > > > >> > items: > > > > > > >> > > >> >>> > > > > > > >> > > >> >>> Knative Executor (AIP-25, currently draft. Being > > worked > > > > on > > > > > > >by > > > > > > >> > Daniel > > > > > > >> > > >> >> Imberman ) > > > > > > >> > > >> >>> Improve Webserver performance (AIP-24, currently > > draft. > > > > > > >Being > > > > > > >> > worked > > > > > > >> > > >> on > > > > > > >> > > >> >> by myself, Kaxil Naik and Zhou Fang) > > > > > > >> > > >> >>> Enhanced real-time UI > > > > > > >> > > >> >>> Improve Scheduler performance > > > > > > >> > > >> >>> Extend/finish the API (AIP-13 is part of this, but > > not > > > > > > >all) > > > > > > >> > > >> >>> Production Docker image + Helm chart > > > > > > >> > > >> >>> We at Astronomer are committing to work on these in > > > > > > >roughly this > > > > > > >> > > order > > > > > > >> > > >> >> if no one else gets to them first. I also propose > that > > > we > > > > > > >create > > > > > > >> > SIGs > > > > > > >> > > >> >> (Special Interest Groups) in slack with > > > weekly/fortnightly > > > > > > >(every > > > > > > >> > 14 > > > > > > >> > > >> days) > > > > > > >> > > >> >> "calls"/update sessions. We already have #sig-ui and > > > > > > >> > > >> #sig-dag-serialisation. > > > > > > >> > > >> >>> > > > > > > >> > > >> >>> This roadmap is also not a promise that all of > these > > > will > > > > > > >be > > > > > > >> done > > > > > > >> > > >> before > > > > > > >> > > >> >> Airflow 2.0 - we may decide later to push something > > back > > > > to > > > > > > >v2.1 > > > > > > >> > etc. > > > > > > >> > > >> >>> > > > > > > >> > > >> >>> Does anyone disagree strongly with these > priorities, > > or > > > > > > >have > > > > > > >> > > anything > > > > > > >> > > >> >> they want to add that you are willing to work on? > > > > > > >> > > >> >>> > > > > > > >> > > >> >>> Thanks, > > > > > > >> > > >> >>> Ash > > > > > > >> > > >> >> > > > > > > >> > > >> >> > > > > > > >> > > >> > > > > > > > >> > > >> > -- > > > > > > >> > > >> > > > > > > > >> > > >> > Tomasz Urbaszek > > > > > > >> > > >> > Polidea <https://www.polidea.com/> | Junior Software > > > > > > >Engineer > > > > > > >> > > >> > > > > > > > >> > > >> > M: +48 505 628 493 <+48505628493> > > > > > > >> > > >> > E: tomasz.urbas...@polidea.com > > > > > > ><tomasz.urbasz...@polidea.com> > > > > > > >> > > >> > > > > > > > >> > > >> > Unique Tech > > > > > > >> > > >> > Check out our projects! < > > > https://www.polidea.com/our-work> > > > > > > >> > > >> > > > > > > >> > > >> > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > -- > > > > > > >> > > > > > > > >> > Jarek Potiuk > > > > > > >> > Polidea <https://www.polidea.com/> | Principal Software > > > Engineer > > > > > > >> > > > > > > > >> > M: +48 660 796 129 <+48660796129> > > > > > > >> > [image: Polidea] <https://www.polidea.com/> > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Chao-Han Tsai > > > > > > > > > > > > > > >