Yep. I second Ash. There were enormous changes under the hood in
Airflow 2 especially when it comes to the performance. A lot of
assumptions and problems from 1.10 do not hold any more on Airflow 2
when it comes to performance characteristics, so you might want to run
your DAGs through Airflow 2 to find out how they behave now.

On Fri, Dec 17, 2021 at 11:13 AM Ash Berlin-Taylor <[email protected]> wrote:
>
> We have massively re-worked (and benchmarked) verify_integrity as part of the 
> HA work (including using a dummy sample of your large DAG structure provided 
> by Kevin) since the 1.10.4 version, and it is no longer the bottleneck it 
> once was. From memory this was mostly fixed around 1.10.12 by improving the 
> queries issued.
>
> We have done performance benchmarks of 1000 concurrent dags with 1000 tasks 
> each and verify_integrity barely showed up on the profile.
>
> -ash
>
> On Thu, Dec 16 2021 at 21:40:47 -0800, Ping Zhang <[email protected]> wrote:
>
> Hi Airflow community,
>
> While reading the airflow latest main branch, I noticed that the dag run 
> creation including the ti creation in (verify_integrity) was moved to the 
> scheduling loop (in the _do_scheduling) from the `DagFileProcessorManager` 
> loop. I would like to learn more about the context behind this.
>
> Since in your production (Airbnb), we have a metric to show that this 
> `verify_integrity` is very expensive for new dag runs, it can take ~47 
> seconds for our large dag (~20K tasks, we have a few dozen of dags reaching 
> this number) for a single dag run with aws db.r5.16xlarge. Even though we 
> have optimized it down to ~17 seconds (We will open source this soon), it is 
> still very expensive.
>
> This will greatly hurt the scheduling performance and lower the overall 
> throughput for large clusters. Creating dag runs for all dags_needing_dagruns 
> in the scheduling loop can exacerbate the scheduling delay even if 
> NUM_DAGS_PER_DAGRUN_QUERY is configurable.
>
> I would like to chat more about this.
>
> Best wishes
>
> Ping Zhang

Reply via email to