Re: [DISCUSS] Shaping the future of executors for Airflow (slowly phasing out Celery ?)

Ash Berlin-Taylor Fri, 26 Nov 2021 06:06:42 -0800

This split Fargate/Lambda executor idea has some relevance for theAIP-1/multi-tenancy discussion too.

One of the things I had been considering for that is that we need tomove DAG-level callbacks out of the scheduler (currently run via theparsing process run on each scheduler) as we can't have scheduler nodesrunning /any/ user code in multi-tenancy for security reasons.

So my idea here is that we extend the role of the Executor to be "runworkloads" -- wether that is "execute this TI" or "run this DAG SLAmiss callback". Crucially it _doesn't_ have to run it all the same, soa BaseExecutor could write the callbacks in to a DB table thatprocessors could pick up (mechanism TBD.) but, crucially, by having itbe part of the Executor interface we can subclass it, and in thisFargate/Lambda example we could have callbacks run in Lambdas!

-a

On Thu, Nov 25 2021 at 23:18:17 +0000, "Oliveira, Niko"<oniko...@amazon.com.INVALID> wrote:

 We could even likely think about

adding more options of similar kind for GCP/AWS/Azure - using native
capabilities of those platforms rather than using generic "Kubernetes"
as remote execution. I can imagine using Fargate (AWS team could
contribute it ), Cloud Run (Google team), Azure Container Instances
(maybe Microsoft will finally also embrace Airflow :) ) .  That would
make the Airflow architecture more "Multiple Cloud Native".

From the AWS side we're very interested and happy to work onsomething like a Fargate executor; it's on our roadmap either way.

But I think a generalized "cloud" or "serverless" executor would makea lot of sense. From AWS alone you may want to execute "small" taskswithin a Lambda (quick start up time but small amount of compute anda 15min max run time) and then "medium" to "large" tasks in ECSFargate or Batch (with longer startup times but more computeavailable), etc. And the same goes for other cloud providerequivalents. A harmonized and configurable solution could makedirecting tasks to different execution environments very smooth.


________________________________________
From: Jarek Potiuk <ja...@potiuk.com <mailto:ja...@potiuk.com>>
Sent: Thursday, November 25, 2021 2:40 AM
To: dev@airflow.apache.org <mailto:dev@airflow.apache.org>

Subject: [EXTERNAL] [DISCUSS] Shaping the future of executors forAirflow (slowly phasing out Celery ?)

CAUTION: This email originated from outside of the organization. Donot click links or open attachments unless you can confirm the senderand know the content is safe.




Hello Everyone,

I recently had some discussions and thought about some new features
implemented already and planned and in-progress work, and I had a
thought - that maybe worth discussing here.

It's very likely many of the people involved had similar discussion
and thoughts, but maybe it's worth spelling it out now and have a
common "direction" we are heading for the future of airflow when it
comes to executors.

TL;DR; I think the recent changes and possibly some future
improvements and optimisation can lead us to the situation that we
will not need Celery Executor (nor CeleryKubernetes)  and can phase it
out eventually - leaving only Local, Kubernetes and soon coming
LocalKubernetes one. We might still "support" CeleryExecutor for
backwards compatibility and people who do not want to run Kubernetes,
but in a way the main reasons why Celery would be preferred over
Kubernetes should be gone soon IMHO.

Why do I think so ?

I think so because I believe the main problems of having
CeleryExecutor in the first place are largely gone. The main reason
why Celery executor was better than the Kubernetes one was that you
could run more short tasks with far less overhead and latency. However
we have now either already implemented or easy to optimise ways of
significantly decreasing the need of running small tasks via "remote"
executors.

The following things already happened:

1) We have Deferrable Operators support. Most of the code there - for
mostly small tasks or parts of the operators that wait for something
already executed in triggerer for those.

2) We have a HA scheduler where you could run multiple schedulers with
Local Executor - thus you can get scalability in LocalExecutor for
small tasks.

3) We had some optimisations in DummyOperator where triggering is done
in Scheduler.

What still can (or is being already done):

* While triggerer does not (I believe) support multiple instances for
now, it has been designed from ground up to support HA/scalability.

* We can rewrite a lot of the operators we have to be Deferrable -
especially those that reach out to external services.

* We can make more "built-in" operators that have some declarative
behaviour rather than imperative "execute" and have them evaluated
directly in Scheduler. We had a discussion about it in
<https://github.com/apache/airflow/pull/19361> - but looks like it
should be possible to implement - for example - "DayOfWeek" operator
that would be evaluated in Scheduler and triggering decisions could be
made there. We could probably add quite a number of such "optimized"
operators that could be declarative and evaluated in a scheduler with
virtually 0 overhead.

* with LocalKubernetes executor coming
<https://github.com/apache/airflow/pull/19729> combined with
HA/scalability of scheduler (thus scalability of Local Executors) - It
seems that any reasonable installation will have enough scalability
and capacity to locally execute all the remaining "small tasks" in
Local Executors. We could even try to figure out some good pattern of
figuring out which tasks are "small" and automatically using
LocalExecutor for them - eventually.

It seems to me that with those upcoming changes, LocalKubernetes
should be default executor in the future rather than Celery (which is
now kind-of de facto "default"). We could even likly think about
adding more options of similar kind for GCP/AWS/Azure - using native
capabilities of those platforms rather than using generic "Kubernetes"
as remote execution. I can imagine using Fargate (AWS team could
contribute it ), Cloud Run (Google team), Azure Container Instances
(maybe Microsoft will finally also embrace Airflow :) ) .  That would
make the Airflow architecture more "Multiple Cloud Native".

Why do I think Celery Executor should be "gone" (possibly not
immediately but possibly with less priority) ?

Problem with Celery is that even with KEDA autoscaling Celery Executor
has big problems with scaling-in (also had discussions about it
recently - with the AWS team among others). Celery is complex and we
are using maybe 5% of it's capabilities (however I had a recent
discussion (at PyWaw where I gave talk about Airflow dependencies)
with people who are heavily using Celery with their product and
utilise a lot more of those capabilities and they are rather unhappy
with the problems they have to deal with and stability of more complex
features of Celery.

I'd love to hear what others think on the subject? It would be great
to have some common "direction" we are heading in agreed and "vision"
of Airflow in the future when it comes to Executors, and I have a
feeling that we are just about a pivotal point where we can all
consciously change our paradigm of thinking about Airflow executors
and prioritising things differently.

J.

Re: [DISCUSS] Shaping the future of executors for Airflow (slowly phasing out Celery ?)

Reply via email to