fshehadeh commented on issue #56446:
URL: https://github.com/apache/airflow/issues/56446#issuecomment-3403608600

   @ephraimbuddy: just from looking at the log of the DAG processor. I added a 
log message to tell me when a child task is created to process a given file. I 
noticed that we had a log message for each of our DAGs every 60 seconds (which 
is the default time configuredfor min_file_process_interval). We can also see 
that the CPU on the DAG processor container is always at the peek.
   
   It seems that the act of parsing the Python for the DAGs can be heavy on 
resources. So assuming that the DAG file (and its imported libraries) have not 
change, then there is no point creating a task for parsing the DAG every 60 
seconds, only to find that the parsed DAG is the same. By calculating and 
tracking the checksum, we can tell when nothing has changed, and we can avoid 
spawning a child task for each DAG every 60 seconds.
   
   I will try to create a PR explaining what we are trying to accomplish.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to