fshehadeh opened a new issue, #56446:
URL: https://github.com/apache/airflow/issues/56446

   ### Description
   
   We run Airflow 3 in an AWS ECS cluster. We noticed that the DAG processor is 
always taking all the CPU in the container, and is continuously reloading our 
DAGs. In our setup, we have about 50 python files, which contain the 
definitions for about 43 DAGs, and the processing time for the DAGs is about 
one second per file on average, and the longest processing time is 3 seconds.
   
   Looking closer at the logs, we noticed that the DAGs are continuously being 
parsed and loaded to the DB. We wanted Airflow to quickly detect and reflect 
changes to our DAGs, and because of 60 seconds for min_file_process_interval. 
However, giving that we don't change the files that often, we wanted an 
optimization that would skip the process of DAGs when we can tell for sure that 
they have not changed. We noticed the introduction of DAG bundles, and the 
versioning of the bundles which is leveraged by the GIT DAG bundles, but not 
the local ones. We decided to take this approach further:
   1. Calculate a checksum for the folder of the local DAG bundle, and use that 
as a version (similar to the GIT commit ID).
   2. Track the bundle version for each file as it is parsed. When it is time 
to populate the file queue, we can compare the current bundle version with that 
of the file from the last time it was parsed. If the version is the same, then 
that means that the DAG python file (and any locally imported common python 
code) has not changed, and therefore we can skip processing it.
   3. Add a maximum age, after which we will force the processing of the DAG 
files (in case there are important side effects of loading them).
   
   After doing these changes, the CPU dropped down significantly. While item 1 
above is specific to local DAG bundles, I think items 2 and 3 can be beneficial 
even for GIT bundles.
   
   ### Use case/motivation
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to