Yep. I second Ash. There were enormous changes under the hood in Airflow 2 especially when it comes to the performance. A lot of assumptions and problems from 1.10 do not hold any more on Airflow 2 when it comes to performance characteristics, so you might want to run your DAGs through Airflow 2 to find out how they behave now.
On Fri, Dec 17, 2021 at 11:13 AM Ash Berlin-Taylor <[email protected]> wrote: > > We have massively re-worked (and benchmarked) verify_integrity as part of the > HA work (including using a dummy sample of your large DAG structure provided > by Kevin) since the 1.10.4 version, and it is no longer the bottleneck it > once was. From memory this was mostly fixed around 1.10.12 by improving the > queries issued. > > We have done performance benchmarks of 1000 concurrent dags with 1000 tasks > each and verify_integrity barely showed up on the profile. > > -ash > > On Thu, Dec 16 2021 at 21:40:47 -0800, Ping Zhang <[email protected]> wrote: > > Hi Airflow community, > > While reading the airflow latest main branch, I noticed that the dag run > creation including the ti creation in (verify_integrity) was moved to the > scheduling loop (in the _do_scheduling) from the `DagFileProcessorManager` > loop. I would like to learn more about the context behind this. > > Since in your production (Airbnb), we have a metric to show that this > `verify_integrity` is very expensive for new dag runs, it can take ~47 > seconds for our large dag (~20K tasks, we have a few dozen of dags reaching > this number) for a single dag run with aws db.r5.16xlarge. Even though we > have optimized it down to ~17 seconds (We will open source this soon), it is > still very expensive. > > This will greatly hurt the scheduling performance and lower the overall > throughput for large clusters. Creating dag runs for all dags_needing_dagruns > in the scheduling loop can exacerbate the scheduling delay even if > NUM_DAGS_PER_DAGRUN_QUERY is configurable. > > I would like to chat more about this. > > Best wishes > > Ping Zhang
