Hi Airflow community, We are proposing to have the Airflow Scheduler adopt a pluggable pattern, similar to the executor.
Background: Airflow 2.0 has introduced a new scheduler in AIP-15 (Scheduler HA + performance improvement) <https://airflow.apache.org/blog/airflow-two-point-oh-is-here/#massive-scheduler-performance-improvements>. The new scheduler leverages the skip-locked feature in the database to scale horizontally <https://airflow.apache.org/docs/apache-airflow/stable/concepts/scheduler.html#overview>. It works well for relatively small clusters (small number of tasks in a dag and small number of dag files) as shown in the benchmark results from the community: Scenario (1000 tasks in total) DAG shape 1.10.10 Total Task Lag 2.0 beta Total Task Lag Speedup 100 DAG files, 1 DAG per file, 10 Tasks per DAG Linear 200 seconds 11.6 seconds 17 times 10 DAG files, 1 DAG per file, 100 Tasks per DAG Linear 144 seconds 14.3 seconds 10 times 10 DAG files, 10 DAGs per file, 10 Tasks per DAG Binary Tree 200 seconds 12 seconds 16 times From: https://www.astronomer.io/blog/airflow-2-scheduler >From the most recent 2022 Airflow survey <https://docs.google.com/document/d/18E3gBbrPI6cHAKRkRIPfju9pOk4EJNd2M-1fRJO4glA/edit#heading=h.yhlzd4j2mpzz>, 81% of the Airflow users have between 1 to 250 DAGs in their largest Airflow instance (4.8% of users have more than 1000 DAGs). 75% of the surveyed Airflow users have between 1 to 100 tasks per DAG. The Airflow 2.0 scheduler can satisfy these needs. However, there are cases where the Airflow 2.0 scheduler cannot be deployed due to: 1. The team cannot use more than one scheduler due to the company’s database team not supporting mysql 8+ or postgresql 10+. (Arguably, it is true that they should be supported but in reality, it can take quite a while for large companies to upgrade to newer db versions) 2. Airflow 2.0 treats all DagRuns with the same scheduling priority (see code <https://github.com/apache/airflow/blob/6b7a343b25b06ab592f19b7e70843dda2d7e0fdb/airflow/jobs/scheduler_job.py#L923>). This means DAGs with more DagRuns could be scheduled more often than others and large DAGs might slow down small DAGs scheduling. This may not be desired in some cases. 3. For very large scale clusters (with more than 10 million rows in the task instance table), the database tends to be the unstable component. The infra team does not want to add extra load to the database with more than one scheduler. However, with only one Airflow 2.0 scheduler, it cannot support large scale clusters as it has removed the multi-processing dag runs and only uses one core to schedule all dag runs <https://github.com/apache/airflow/blob/6b7a343b25b06ab592f19b7e70843dda2d7e0fdb/airflow/jobs/scheduler_job.py#L886-L976> . The above limitations hinder evolving Airflow as a general purpose scheduling platform. To address the above limitations and avoid making the scheduler core code larger and logic more complex, we propose to have a pluggable scheduler pattern. With that, the Airflow infra team/users can choose the best scheduler to satisfy their needs and even swap parts that need customization to achieve their best interest. Please let me know your thoughts about this and look forward to feedback. (Here is the google doc link, https://docs.google.com/document/d/1njmX3D_9a4TjjG9CYPWJqdkb9EyXkeQPnycYaMTUQ_s/edit?usp=sharing feel free to comment it in the doc) Thanks, Ping
