Description In the beginning, the Airflow community takes integrity as the first priority,
- use Celery as a task schedule framework - use PostgreSQL, MySQL, or MSSQL as meta database backend And the community splits providers from the architecture, which brings a large number of providers <https://airflow.apache.org/docs/apache-airflow-providers/> into Airflow. Now Airflow has been the popular distributed, cloud-native workflow management platform. I think maybe we can make the scheduler pluggable. Now we have the following constraints: 1. Scheduler Database Requirements <https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/scheduler.html#database-requirements> bring some performance bottleneck 2. SQL-Compatible meta database backend requirements In fact, the Airflow platform relies on these dependencies: 1. AMQP-Compatible Task Queue, which is relied on by the Celery framework and uses Redis as the default implementation, is optional since we bring Kubernetes Executor <https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/kubernetes.html#> as an option. 2. metadata storage 3. distributed lock (maybe we can partition scheduler/executor in the future) Now 2/3 actually binds into the SQL-Compatible meta database backend requirements. If we can make these 3 dependencies pluggable, we can definitely use some k8s-compatible solution, like *ETCD*, which can undertake these 3 duties instead of bringing new external dependencies in the k8s environment. But I am indeed a freshman in the community, all these above are my immature thinking. welcome to correct me if wrong. I am willing to learn much more about architectural thinking in our community. Use case/motivation 1. further decoupling airflow from specific meta-database backend implementation 2. brings ETCD as meta database backend/task queue, which may benefit airflow cloud-native roadmap 3. make the scheduler pluggable
