Hi Airflow community,

While reading the airflow latest main branch, I noticed that the dag run
creation including the ti creation in (verify_integrity) was moved to the
scheduling loop (in the _do_scheduling) from the `DagFileProcessorManager`
loop. I would like to learn more about the context behind this.

Since in your production (Airbnb), we have a metric to show that this
`verify_integrity` is very expensive for new dag runs, it can take ~47
seconds for our large dag (~20K tasks, we have a few dozen of dags reaching
this number) for a single dag run with aws db.r5.16xlarge. Even though we
have optimized it down to ~17 seconds (We will open source this soon), it
is still very expensive.

This will greatly hurt the scheduling performance and lower the overall
throughput for large clusters. Creating dag runs for
all dags_needing_dagruns in the scheduling loop can exacerbate the
scheduling delay even if NUM_DAGS_PER_DAGRUN_QUERY is configurable.

I would like to chat more about this.

Best wishes

Ping Zhang

Reply via email to