+ 1 on this as well. From what I have seen, standalone DAG processing results 
in a minor performance advantage and, importantly, makes the Scheduler loop 
more resilient to DAG processor crashes.

Shubham

On 2025-01-09, 4:02 PM, "Daniel Imberman" <daniel.imber...@gmail.com 
<mailto:daniel.imber...@gmail.com>> wrote:


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.






AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne 
cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas 
confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le 
contenu ne présente aucun risque.






I'm +1 on this.


The fact that there's one more thing to deploy isn't that big of an issue
given the number of pre-configurable options mentioned (e.g. helm) and a
full logical separation of DAG parsing and scheduling makes sense (one
thing that has been a longstanding issue with Airflow is the scheduler
"Doing too many things", so it would be nice to create a clean divide
here).


On Thu, Jan 9, 2025 at 3:28 PM Jed Cunningham <j...@astronomer.io.inva 
<mailto:j...@astronomer.io.inva>lid>
wrote:


> Hello everyone!
>
> As I've been working on parsing lately, I want to propose a change in that
> area in time for Airflow 3.
>
> Today there are 2 different ways the DAG processor can be run in Airflow -
> as a standalone component, or embedded in the scheduler. The standalone
> option came in 2.3, prior to that the only option was it being embedded in
> the scheduler.
>
> Why standalone? Generally speaking, parsing scales vertically (single loop
> - more concurrent parsing) while scheduling is scaled horizontally (many
> loops). As the DAG processor and scheduler scale in different manners, it's
> awkward to have them live in the same component. There is also a resiliency
> aspect here, no noisy neighbor issues.
>
> Really, the only positive of the embedded option is that it's easier to
> deploy, as there is 1 less component to worry about. However, we already
> have a number of components, so 1 more isn't that cumbersome. Everyone
> using breeze, standalone, the helm chart, a vendor, won't be impacted much
> by this change - in fact, having the log stream separate is a big positive!
>
> We'd also be able to remove a bit of complexity around reinitialising a
> bunch of stuff in the child process.
>
> Overall, I see primarily positives with this change, and a major version
> upgrade is the perfect time to simplify this part of Airflow. Thoughts?
>
> Jed
>



Reply via email to