I'm +1 on this. The fact that there's one more thing to deploy isn't that big of an issue given the number of pre-configurable options mentioned (e.g. helm) and a full logical separation of DAG parsing and scheduling makes sense (one thing that has been a longstanding issue with Airflow is the scheduler "Doing too many things", so it would be nice to create a clean divide here).
On Thu, Jan 9, 2025 at 3:28 PM Jed Cunningham <j...@astronomer.io.invalid> wrote: > Hello everyone! > > As I've been working on parsing lately, I want to propose a change in that > area in time for Airflow 3. > > Today there are 2 different ways the DAG processor can be run in Airflow - > as a standalone component, or embedded in the scheduler. The standalone > option came in 2.3, prior to that the only option was it being embedded in > the scheduler. > > Why standalone? Generally speaking, parsing scales vertically (single loop > - more concurrent parsing) while scheduling is scaled horizontally (many > loops). As the DAG processor and scheduler scale in different manners, it's > awkward to have them live in the same component. There is also a resiliency > aspect here, no noisy neighbor issues. > > Really, the only positive of the embedded option is that it's easier to > deploy, as there is 1 less component to worry about. However, we already > have a number of components, so 1 more isn't that cumbersome. Everyone > using breeze, standalone, the helm chart, a vendor, won't be impacted much > by this change - in fact, having the log stream separate is a big positive! > > We'd also be able to remove a bit of complexity around reinitialising a > bunch of stuff in the child process. > > Overall, I see primarily positives with this change, and a major version > upgrade is the perfect time to simplify this part of Airflow. Thoughts? > > Jed >