Description

In the beginning, the Airflow community takes integrity as the first
priority,

   - use Celery as a task schedule framework
   - use PostgreSQL, MySQL, or MSSQL as meta database backend

And the community splits providers from the architecture, which brings a
large number of providers
<https://airflow.apache.org/docs/apache-airflow-providers/> into Airflow.

Now Airflow has been the popular distributed, cloud-native workflow
management platform.

I think maybe we can make the scheduler pluggable.

Now we have the following constraints:

   1. Scheduler Database Requirements
   
<https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/scheduler.html#database-requirements>
bring
   some performance bottleneck
   2. SQL-Compatible meta database backend requirements

In fact, the Airflow platform relies on these dependencies:

   1. AMQP-Compatible Task Queue, which is relied on by the Celery
   framework and uses Redis as the default implementation, is optional since
   we bring Kubernetes Executor
   
<https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/kubernetes.html#>
as
   an option.
   2. metadata storage
   3. distributed lock (maybe we can partition scheduler/executor in the
   future)

Now 2/3 actually binds into the SQL-Compatible meta database backend
requirements.

If we can make these 3 dependencies pluggable, we can definitely use some
k8s-compatible solution,
like *ETCD*, which can undertake these 3 duties instead of bringing new
external dependencies in the k8s environment.

But I am indeed a freshman in the community, all these above are my
immature thinking.

welcome to correct me if wrong.

I am willing to learn much more about architectural thinking in our
community.
Use case/motivation

   1. further decoupling airflow from specific meta-database backend
   implementation
   2. brings ETCD as meta database backend/task queue, which may benefit
   airflow cloud-native roadmap
   3. make the scheduler pluggable

Reply via email to