Hi Airflow community, I would like to share some of my thoughts on the active-active scheduler HA mode.
I am wondering whether the active-active scheduler mode is really needed to improve the scheduler performance. One scheduler host can easily support ~5000 dags in our production with only max scheduling delay of ~60 seconds (for the largest dag ~23K tasks) after our Next-Gen Scheduler work. I don't see a need to set up the active-active scheduler for the performance reason. [image: image.png] Setting up the active-active scheduler mode can only increase the complexity of cluster operations. There are also restrictions on DB, including DB types and DB versions. I do agree that the airflow scheduler needs better HA. We could use the active-passive mode.This can greatly simplify the scheduler code, without needing the lock in the code and dealing with potential deadlock. We noticed that the majority of our prod incidents come from the database. With the current active-active HA mode, it might exacerbate the problem. Would love to hear your thoughts about this. Best wishes Ping Zhang
