Yep. Definitely - part of AIP-1 :). Having the Executor extended to run all kinds of "workloads" is a great idea!
And I love the comments - re Fargate and Batch cases - really cool to see the different perspectives here. We definitely need to get more such discussions :) On Fri, Nov 26, 2021 at 3:06 PM Ash Berlin-Taylor <a...@apache.org> wrote: > > This split Fargate/Lambda executor idea has some relevance for the > AIP-1/multi-tenancy discussion too. > > One of the things I had been considering for that is that we need to move > DAG-level callbacks out of the scheduler (currently run via the parsing > process run on each scheduler) as we can't have scheduler nodes running any > user code in multi-tenancy for security reasons. > > So my idea here is that we extend the role of the Executor to be "run > workloads" -- wether that is "execute this TI" or "run this DAG SLA miss > callback". Crucially it _doesn't_ have to run it all the same, so a > BaseExecutor could write the callbacks in to a DB table that processors could > pick up (mechanism TBD.) but, crucially, by having it be part of the Executor > interface we can subclass it, and in this Fargate/Lambda example we could > have callbacks run in Lambdas! > > -a > > On Thu, Nov 25 2021 at 23:18:17 +0000, "Oliveira, Niko" > <oniko...@amazon.com.INVALID> wrote: > > We could even likely think about > > adding more options of similar kind for GCP/AWS/Azure - using native > capabilities of those platforms rather than using generic "Kubernetes" as > remote execution. I can imagine using Fargate (AWS team could contribute it > ), Cloud Run (Google team), Azure Container Instances (maybe Microsoft will > finally also embrace Airflow :) ) . That would make the Airflow architecture > more "Multiple Cloud Native". From the AWS side we're very interested and > happy to work on something like a Fargate executor; it's on our roadmap > either way. But I think a generalized "cloud" or "serverless" executor would > make a lot of sense. From AWS alone you may want to execute "small" tasks > within a Lambda (quick start up time but small amount of compute and a 15min > max run time) and then "medium" to "large" tasks in ECS Fargate or Batch > (with longer startup times but more compute available), etc. And the same > goes for other cloud provider equivalents. A harmonized and configurable > solution could make directing tasks to different execution environments very > smooth. ________________________________________ From: Jarek Potiuk > <ja...@potiuk.com> Sent: Thursday, November 25, 2021 2:40 AM To: > dev@airflow.apache.org Subject: [EXTERNAL] [DISCUSS] Shaping the future of > executors for Airflow (slowly phasing out Celery ?) CAUTION: This email > originated from outside of the organization. Do not click links or open > attachments unless you can confirm the sender and know the content is safe. > Hello Everyone, I recently had some discussions and thought about some new > features implemented already and planned and in-progress work, and I had a > thought - that maybe worth discussing here. It's very likely many of the > people involved had similar discussion and thoughts, but maybe it's worth > spelling it out now and have a common "direction" we are heading for the > future of airflow when it comes to executors. TL;DR; I think the recent > changes and possibly some future improvements and optimisation can lead us to > the situation that we will not need Celery Executor (nor CeleryKubernetes) > and can phase it out eventually - leaving only Local, Kubernetes and soon > coming LocalKubernetes one. We might still "support" CeleryExecutor for > backwards compatibility and people who do not want to run Kubernetes, but in > a way the main reasons why Celery would be preferred over Kubernetes should > be gone soon IMHO. Why do I think so ? I think so because I believe the main > problems of having CeleryExecutor in the first place are largely gone. The > main reason why Celery executor was better than the Kubernetes one was that > you could run more short tasks with far less overhead and latency. However we > have now either already implemented or easy to optimise ways of significantly > decreasing the need of running small tasks via "remote" executors. The > following things already happened: 1) We have Deferrable Operators support. > Most of the code there - for mostly small tasks or parts of the operators > that wait for something already executed in triggerer for those. 2) We have a > HA scheduler where you could run multiple schedulers with Local Executor - > thus you can get scalability in LocalExecutor for small tasks. 3) We had some > optimisations in DummyOperator where triggering is done in Scheduler. What > still can (or is being already done): * While triggerer does not (I believe) > support multiple instances for now, it has been designed from ground up to > support HA/scalability. * We can rewrite a lot of the operators we have to be > Deferrable - especially those that reach out to external services. * We can > make more "built-in" operators that have some declarative behaviour rather > than imperative "execute" and have them evaluated directly in Scheduler. We > had a discussion about it in https://github.com/apache/airflow/pull/19361 - > but looks like it should be possible to implement - for example - "DayOfWeek" > operator that would be evaluated in Scheduler and triggering decisions could > be made there. We could probably add quite a number of such "optimized" > operators that could be declarative and evaluated in a scheduler with > virtually 0 overhead. * with LocalKubernetes executor coming > https://github.com/apache/airflow/pull/19729 combined with HA/scalability of > scheduler (thus scalability of Local Executors) - It seems that any > reasonable installation will have enough scalability and capacity to locally > execute all the remaining "small tasks" in Local Executors. We could even try > to figure out some good pattern of figuring out which tasks are "small" and > automatically using LocalExecutor for them - eventually. It seems to me that > with those upcoming changes, LocalKubernetes should be default executor in > the future rather than Celery (which is now kind-of de facto "default"). We > could even likly think about adding more options of similar kind for > GCP/AWS/Azure - using native capabilities of those platforms rather than > using generic "Kubernetes" as remote execution. I can imagine using Fargate > (AWS team could contribute it ), Cloud Run (Google team), Azure Container > Instances (maybe Microsoft will finally also embrace Airflow :) ) . That > would make the Airflow architecture more "Multiple Cloud Native". Why do I > think Celery Executor should be "gone" (possibly not immediately but possibly > with less priority) ? Problem with Celery is that even with KEDA autoscaling > Celery Executor has big problems with scaling-in (also had discussions about > it recently - with the AWS team among others). Celery is complex and we are > using maybe 5% of it's capabilities (however I had a recent discussion (at > PyWaw where I gave talk about Airflow dependencies) with people who are > heavily using Celery with their product and utilise a lot more of those > capabilities and they are rather unhappy with the problems they have to deal > with and stability of more complex features of Celery. I'd love to hear what > others think on the subject? It would be great to have some common > "direction" we are heading in agreed and "vision" of Airflow in the future > when it comes to Executors, and I have a feeling that we are just about a > pivotal point where we can all consciously change our paradigm of thinking > about Airflow executors and prioritising things differently. J.