[DISCUSS][PROPOSAL] scheduler pluggable extension

Huang Junyao Thu, 03 Aug 2023 09:18:18 -0700

Description

In the beginning, the Airflow community takes integrity as the first
priority,

- use Celery as a task schedule framework
- use PostgreSQL, MySQL, or MSSQL as meta database backend

And the community splits providers from the architecture, which brings a
large number of providers
<https://airflow.apache.org/docs/apache-airflow-providers/> into Airflow.

Now Airflow has been the popular distributed, cloud-native workflow
management platform.

I think maybe we can make the scheduler pluggable.

Now we have the following constraints:

1. Scheduler Database Requirements

<https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/scheduler.html#database-requirements>
bring
some performance bottleneck
2. SQL-Compatible meta database backend requirements

In fact, the Airflow platform relies on these dependencies:

1. AMQP-Compatible Task Queue, which is relied on by the Celery
framework and uses Redis as the default implementation, is optional since
we bring Kubernetes Executor

<https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/kubernetes.html#>
as
an option.
2. metadata storage
3. distributed lock (maybe we can partition scheduler/executor in the
future)

Now 2/3 actually binds into the SQL-Compatible meta database backend
requirements.

If we can make these 3 dependencies pluggable, we can definitely use some
k8s-compatible solution,
like *ETCD*, which can undertake these 3 duties instead of bringing new
external dependencies in the k8s environment.

But I am indeed a freshman in the community, all these above are my
immature thinking.

welcome to correct me if wrong.

I am willing to learn much more about architectural thinking in our
community.
Use case/motivation

1. further decoupling airflow from specific meta-database backend
implementation
2. brings ETCD as meta database backend/task queue, which may benefit
airflow cloud-native roadmap
3. make the scheduler pluggable

[DISCUSS][PROPOSAL] scheduler pluggable extension

Reply via email to