Hi Airflow community,

I would like to share some of my thoughts on the active-active scheduler HA
mode.

I am wondering whether the active-active scheduler mode is really needed to
improve the scheduler performance.

One scheduler host can easily support ~5000 dags in our production with
only max scheduling delay of ~60 seconds (for the largest dag ~23K tasks)
after our Next-Gen Scheduler work.

I don't see a need to set up the active-active scheduler for the
performance reason.

[image: image.png]
Setting up the active-active scheduler mode can only increase the
complexity of cluster operations. There are also restrictions on DB,
including DB types and DB versions.

I do agree that the airflow scheduler needs better HA. We could use the
active-passive mode.This can greatly simplify the scheduler code, without
needing the lock in the code and dealing with potential deadlock.

We noticed that the majority of our prod incidents come from the database.
With the current active-active HA mode, it might exacerbate the problem.

Would love to hear your thoughts about this.


Best wishes

Ping Zhang

Reply via email to