Re: [DISCUSS] Shaping the future of executors for Airflow (slowly phasing out Celery ?)

Jarek Potiuk Fri, 26 Nov 2021 06:56:59 -0800

Yep. Definitely - part of AIP-1 :).

Having the Executor extended to run all kinds of  "workloads" is a great idea!


And I love the comments - re Fargate and Batch cases - really cool to
see the different perspectives here.  We definitely need to get more
such discussions :)

On Fri, Nov 26, 2021 at 3:06 PM Ash Berlin-Taylor <a...@apache.org> wrote:
>
> This split Fargate/Lambda executor idea has some relevance for the 
> AIP-1/multi-tenancy discussion too.
>
> One of the things I had been considering for that is that we need to move 
> DAG-level callbacks out of the scheduler (currently run via the parsing 
> process run on each scheduler) as we can't have scheduler nodes running any 
> user code in multi-tenancy for security reasons.
>
> So my idea here is that we extend the role of the Executor to be "run 
> workloads" -- wether that is "execute this TI" or "run this DAG SLA miss 
> callback". Crucially it _doesn't_ have to run it all the same, so a 
> BaseExecutor could write the callbacks in to a DB table that processors could 
> pick up (mechanism TBD.) but, crucially, by having it be part of the Executor 
> interface we can subclass it, and in this Fargate/Lambda example we could 
> have callbacks run in Lambdas!
>
> -a
>
> On Thu, Nov 25 2021 at 23:18:17 +0000, "Oliveira, Niko" 
> <oniko...@amazon.com.INVALID> wrote:
>
> We could even likely think about
>
> adding more options of similar kind for GCP/AWS/Azure - using native 
> capabilities of those platforms rather than using generic "Kubernetes" as 
> remote execution. I can imagine using Fargate (AWS team could contribute it 
> ), Cloud Run (Google team), Azure Container Instances (maybe Microsoft will 
> finally also embrace Airflow :) ) . That would make the Airflow architecture 
> more "Multiple Cloud Native". From the AWS side we're very interested and 
> happy to work on something like a Fargate executor; it's on our roadmap 
> either way. But I think a generalized "cloud" or "serverless" executor would 
> make a lot of sense. From AWS alone you may want to execute "small" tasks 
> within a Lambda (quick start up time but small amount of compute and a 15min 
> max run time) and then "medium" to "large" tasks in ECS Fargate or Batch 
> (with longer startup times but more compute available), etc. And the same 
> goes for other cloud provider equivalents. A harmonized and configurable 
> solution could make directing tasks to different execution environments very 
> smooth. ________________________________________ From: Jarek Potiuk 
> <ja...@potiuk.com> Sent: Thursday, November 25, 2021 2:40 AM To: 
> dev@airflow.apache.org Subject: [EXTERNAL] [DISCUSS] Shaping the future of 
> executors for Airflow (slowly phasing out Celery ?) CAUTION: This email 
> originated from outside of the organization. Do not click links or open 
> attachments unless you can confirm the sender and know the content is safe. 
> Hello Everyone, I recently had some discussions and thought about some new 
> features implemented already and planned and in-progress work, and I had a 
> thought - that maybe worth discussing here. It's very likely many of the 
> people involved had similar discussion and thoughts, but maybe it's worth 
> spelling it out now and have a common "direction" we are heading for the 
> future of airflow when it comes to executors. TL;DR; I think the recent 
> changes and possibly some future improvements and optimisation can lead us to 
> the situation that we will not need Celery Executor (nor CeleryKubernetes) 
> and can phase it out eventually - leaving only Local, Kubernetes and soon 
> coming LocalKubernetes one. We might still "support" CeleryExecutor for 
> backwards compatibility and people who do not want to run Kubernetes, but in 
> a way the main reasons why Celery would be preferred over Kubernetes should 
> be gone soon IMHO. Why do I think so ? I think so because I believe the main 
> problems of having CeleryExecutor in the first place are largely gone. The 
> main reason why Celery executor was better than the Kubernetes one was that 
> you could run more short tasks with far less overhead and latency. However we 
> have now either already implemented or easy to optimise ways of significantly 
> decreasing the need of running small tasks via "remote" executors. The 
> following things already happened: 1) We have Deferrable Operators support. 
> Most of the code there - for mostly small tasks or parts of the operators 
> that wait for something already executed in triggerer for those. 2) We have a 
> HA scheduler where you could run multiple schedulers with Local Executor - 
> thus you can get scalability in LocalExecutor for small tasks. 3) We had some 
> optimisations in DummyOperator where triggering is done in Scheduler. What 
> still can (or is being already done): * While triggerer does not (I believe) 
> support multiple instances for now, it has been designed from ground up to 
> support HA/scalability. * We can rewrite a lot of the operators we have to be 
> Deferrable - especially those that reach out to external services. * We can 
> make more "built-in" operators that have some declarative behaviour rather 
> than imperative "execute" and have them evaluated directly in Scheduler. We 
> had a discussion about it in https://github.com/apache/airflow/pull/19361 - 
> but looks like it should be possible to implement - for example - "DayOfWeek" 
> operator that would be evaluated in Scheduler and triggering decisions could 
> be made there. We could probably add quite a number of such "optimized" 
> operators that could be declarative and evaluated in a scheduler with 
> virtually 0 overhead. * with LocalKubernetes executor coming 
> https://github.com/apache/airflow/pull/19729 combined with HA/scalability of 
> scheduler (thus scalability of Local Executors) - It seems that any 
> reasonable installation will have enough scalability and capacity to locally 
> execute all the remaining "small tasks" in Local Executors. We could even try 
> to figure out some good pattern of figuring out which tasks are "small" and 
> automatically using LocalExecutor for them - eventually. It seems to me that 
> with those upcoming changes, LocalKubernetes should be default executor in 
> the future rather than Celery (which is now kind-of de facto "default"). We 
> could even likly think about adding more options of similar kind for 
> GCP/AWS/Azure - using native capabilities of those platforms rather than 
> using generic "Kubernetes" as remote execution. I can imagine using Fargate 
> (AWS team could contribute it ), Cloud Run (Google team), Azure Container 
> Instances (maybe Microsoft will finally also embrace Airflow :) ) . That 
> would make the Airflow architecture more "Multiple Cloud Native". Why do I 
> think Celery Executor should be "gone" (possibly not immediately but possibly 
> with less priority) ? Problem with Celery is that even with KEDA autoscaling 
> Celery Executor has big problems with scaling-in (also had discussions about 
> it recently - with the AWS team among others). Celery is complex and we are 
> using maybe 5% of it's capabilities (however I had a recent discussion (at 
> PyWaw where I gave talk about Airflow dependencies) with people who are 
> heavily using Celery with their product and utilise a lot more of those 
> capabilities and they are rather unhappy with the problems they have to deal 
> with and stability of more complex features of Celery. I'd love to hear what 
> others think on the subject? It would be great to have some common 
> "direction" we are heading in agreed and "vision" of Airflow in the future 
> when it comes to Executors, and I have a feeling that we are just about a 
> pivotal point where we can all consciously change our paradigm of thinking 
> about Airflow executors and prioritising things differently. J.

Re: [DISCUSS] Shaping the future of executors for Airflow (slowly phasing out Celery ?)

Reply via email to