I'm +1 on this.

The fact that there's one more thing to deploy isn't that big of an issue
given the number of pre-configurable options mentioned (e.g. helm) and a
full logical separation of DAG parsing and scheduling makes sense (one
thing that has been a longstanding issue with Airflow is the scheduler
"Doing too many things", so it would be nice to create a clean divide
here).

On Thu, Jan 9, 2025 at 3:28 PM Jed Cunningham <j...@astronomer.io.invalid>
wrote:

> Hello everyone!
>
> As I've been working on parsing lately, I want to propose a change in that
> area in time for Airflow 3.
>
> Today there are 2 different ways the DAG processor can be run in Airflow -
> as a standalone component, or embedded in the scheduler. The standalone
> option came in 2.3, prior to that the only option was it being embedded in
> the scheduler.
>
> Why standalone? Generally speaking, parsing scales vertically (single loop
> - more concurrent parsing) while scheduling is scaled horizontally (many
> loops). As the DAG processor and scheduler scale in different manners, it's
> awkward to have them live in the same component. There is also a resiliency
> aspect here, no noisy neighbor issues.
>
> Really, the only positive of the embedded option is that it's easier to
> deploy, as there is 1 less component to worry about. However, we already
> have a number of components, so 1 more isn't that cumbersome. Everyone
> using breeze, standalone, the helm chart, a vendor, won't be impacted much
> by this change - in fact, having the log stream separate is a big positive!
>
> We'd also be able to remove a bit of complexity around reinitialising a
> bunch of stuff in the child process.
>
> Overall, I see primarily positives with this change, and a major version
> upgrade is the perfect time to simplify this part of Airflow. Thoughts?
>
> Jed
>

Reply via email to