Hello everyone! As I've been working on parsing lately, I want to propose a change in that area in time for Airflow 3.
Today there are 2 different ways the DAG processor can be run in Airflow - as a standalone component, or embedded in the scheduler. The standalone option came in 2.3, prior to that the only option was it being embedded in the scheduler. Why standalone? Generally speaking, parsing scales vertically (single loop - more concurrent parsing) while scheduling is scaled horizontally (many loops). As the DAG processor and scheduler scale in different manners, it's awkward to have them live in the same component. There is also a resiliency aspect here, no noisy neighbor issues. Really, the only positive of the embedded option is that it's easier to deploy, as there is 1 less component to worry about. However, we already have a number of components, so 1 more isn't that cumbersome. Everyone using breeze, standalone, the helm chart, a vendor, won't be impacted much by this change - in fact, having the log stream separate is a big positive! We'd also be able to remove a bit of complexity around reinitialising a bunch of stuff in the child process. Overall, I see primarily positives with this change, and a major version upgrade is the perfect time to simplify this part of Airflow. Thoughts? Jed