fshehadeh commented on issue #56446: URL: https://github.com/apache/airflow/issues/56446#issuecomment-3403608600
@ephraimbuddy: just from looking at the log of the DAG processor. I added a log message to tell me when a child task is created to process a given file. I noticed that we had a log message for each of our DAGs every 60 seconds (which is the default time configuredfor min_file_process_interval). We can also see that the CPU on the DAG processor container is always at the peek. It seems that the act of parsing the Python for the DAGs can be heavy on resources. So assuming that the DAG file (and its imported libraries) have not change, then there is no point creating a task for parsing the DAG every 60 seconds, only to find that the parsed DAG is the same. By calculating and tracking the checksum, we can tell when nothing has changed, and we can avoid spawning a child task for each DAG every 60 seconds. I will try to create a PR explaining what we are trying to accomplish. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
