[DISCUSS] Drop support for the DAG processor embedded in the scheduler

Jed Cunningham Thu, 09 Jan 2025 15:28:37 -0800

Hello everyone!

As I've been working on parsing lately, I want to propose a change in that
area in time for Airflow 3.


Today there are 2 different ways the DAG processor can be run in Airflow -
as a standalone component, or embedded in the scheduler. The standalone
option came in 2.3, prior to that the only option was it being embedded in
the scheduler.

Why standalone? Generally speaking, parsing scales vertically (single loop
- more concurrent parsing) while scheduling is scaled horizontally (many
loops). As the DAG processor and scheduler scale in different manners, it's
awkward to have them live in the same component. There is also a resiliency
aspect here, no noisy neighbor issues.

Really, the only positive of the embedded option is that it's easier to
deploy, as there is 1 less component to worry about. However, we already
have a number of components, so 1 more isn't that cumbersome. Everyone
using breeze, standalone, the helm chart, a vendor, won't be impacted much
by this change - in fact, having the log stream separate is a big positive!

We'd also be able to remove a bit of complexity around reinitialising a
bunch of stuff in the child process.

Overall, I see primarily positives with this change, and a major version
upgrade is the perfect time to simplify this part of Airflow. Thoughts?

Jed

[DISCUSS] Drop support for the DAG processor embedded in the scheduler

Reply via email to