Re: [D] DAG Parsing Performance with Large Number of Dynamically Generated DAGs [airflow]

via GitHub Tue, 17 Dec 2024 03:11:50 -0800


GitHub user potiuk added a comment to the discussion: DAG Parsing Performance 
with Large Number of Dynamically Generated DAGs


> 1. What's the recommended approach to handle DAG parsing for large numbers of 
> dynamically generated DAGs?

Make them as fast and as little overhead as you can 
https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#top-level-python-code

> 2. Are there built-in mechanisms in Airflow to partition DAG file processing 
> to improve parsing performance?

This is coming in Airflow 3 in the form of DAG bundles: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=294816356

> 3. How do other organizations handle similar scale with dynamic DAG 
> generation?

* Optimizing your parsing time is the best idea
* Have several standalone dag processors and splitting the processing by 
`--subdir` flags - you can split your DAGs by subdirectories and parse each 
subdirectory by one or more separate dag file processors.
* Split your airflow installation into several independent ones


GitHub link: 
https://github.com/apache/airflow/discussions/44727#discussioncomment-11591922

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] DAG Parsing Performance with Large Number of Dynamically Generated DAGs [airflow]

Reply via email to