IMHO this is not splitting but completely rewriting the scheduler from scratch. If you decouple the database from the scheduler there is nothing left, because essentially about 80% of the code for it is SQL-alchemy / Relational database bound.
So I would say this post should be named "Should we start building a new Airflow from scratch". But this is just my opinion, I might be biased and very wrong on that. J, On Thu, Aug 3, 2023 at 6:18 PM Huang Junyao <[email protected]> wrote: > > Description > > In the beginning, the Airflow community takes integrity as the first > priority, > > - use Celery as a task schedule framework > - use PostgreSQL, MySQL, or MSSQL as meta database backend > > And the community splits providers from the architecture, which brings a > large number of providers > <https://airflow.apache.org/docs/apache-airflow-providers/> into Airflow. > > Now Airflow has been the popular distributed, cloud-native workflow > management platform. > > I think maybe we can make the scheduler pluggable. > > Now we have the following constraints: > > 1. Scheduler Database Requirements > > <https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/scheduler.html#database-requirements> > bring > some performance bottleneck > 2. SQL-Compatible meta database backend requirements > > In fact, the Airflow platform relies on these dependencies: > > 1. AMQP-Compatible Task Queue, which is relied on by the Celery > framework and uses Redis as the default implementation, is optional since > we bring Kubernetes Executor > > <https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/kubernetes.html#> > as > an option. > 2. metadata storage > 3. distributed lock (maybe we can partition scheduler/executor in the > future) > > Now 2/3 actually binds into the SQL-Compatible meta database backend > requirements. > > If we can make these 3 dependencies pluggable, we can definitely use some > k8s-compatible solution, > like *ETCD*, which can undertake these 3 duties instead of bringing new > external dependencies in the k8s environment. > > But I am indeed a freshman in the community, all these above are my > immature thinking. > > welcome to correct me if wrong. > > I am willing to learn much more about architectural thinking in our > community. > Use case/motivation > > 1. further decoupling airflow from specific meta-database backend > implementation > 2. brings ETCD as meta database backend/task queue, which may benefit > airflow cloud-native roadmap > 3. make the scheduler pluggable --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
