+1

On Fri, 10 Jan 2025 at 07:43, Mehta, Shubham <shu...@amazon.com.invalid>
wrote:

> + 1 on this as well. From what I have seen, standalone DAG processing
> results in a minor performance advantage and, importantly, makes the
> Scheduler loop more resilient to DAG processor crashes.
>
> Shubham
>
> On 2025-01-09, 4:02 PM, "Daniel Imberman" <daniel.imber...@gmail.com
> <mailto:daniel.imber...@gmail.com>> wrote:
>
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
>
>
>
> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
> le contenu ne présente aucun risque.
>
>
>
>
>
>
> I'm +1 on this.
>
>
> The fact that there's one more thing to deploy isn't that big of an issue
> given the number of pre-configurable options mentioned (e.g. helm) and a
> full logical separation of DAG parsing and scheduling makes sense (one
> thing that has been a longstanding issue with Airflow is the scheduler
> "Doing too many things", so it would be nice to create a clean divide
> here).
>
>
> On Thu, Jan 9, 2025 at 3:28 PM Jed Cunningham <j...@astronomer.io.inva
> <mailto:j...@astronomer.io.inva>lid>
> wrote:
>
>
> > Hello everyone!
> >
> > As I've been working on parsing lately, I want to propose a change in
> that
> > area in time for Airflow 3.
> >
> > Today there are 2 different ways the DAG processor can be run in Airflow
> -
> > as a standalone component, or embedded in the scheduler. The standalone
> > option came in 2.3, prior to that the only option was it being embedded
> in
> > the scheduler.
> >
> > Why standalone? Generally speaking, parsing scales vertically (single
> loop
> > - more concurrent parsing) while scheduling is scaled horizontally (many
> > loops). As the DAG processor and scheduler scale in different manners,
> it's
> > awkward to have them live in the same component. There is also a
> resiliency
> > aspect here, no noisy neighbor issues.
> >
> > Really, the only positive of the embedded option is that it's easier to
> > deploy, as there is 1 less component to worry about. However, we already
> > have a number of components, so 1 more isn't that cumbersome. Everyone
> > using breeze, standalone, the helm chart, a vendor, won't be impacted
> much
> > by this change - in fact, having the log stream separate is a big
> positive!
> >
> > We'd also be able to remove a bit of complexity around reinitialising a
> > bunch of stuff in the child process.
> >
> > Overall, I see primarily positives with this change, and a major version
> > upgrade is the perfect time to simplify this part of Airflow. Thoughts?
> >
> > Jed
> >
>
>
>
>

Reply via email to