twindebank opened a new issue #11494:
URL: https://github.com/apache/airflow/issues/11494


   **Apache Airflow version**: 1.10.12
   **Environment**: Mac / Docker (official image) for Mac / Docker (official 
image) deployed to GCE
   
   
   ## What happened
   
   **There appears to be a 2-4x slowdown in DAG file processing times when 
comparing v1.10.11 to v1.10.12.** 
   
   We first noticed the issue when we upgraded our production deployment. We 
found that some DAGs were failing to schedule, and on inspecting the logs we 
worked out that the `AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT` limit was being 
hit. Looking through the `dag_processor_manager` logs, we found that DAGs were 
taking 3-4x longer to build in v1.10.12 compared to v1.10.11.
   
   I set up a minimal test environment to replicate this issue locally with one 
randomly generated DAG containing 500 nodes. I first ran Airflow using the 
official docker images for quick results, then I ran Airflow directly on my 
machine once I confirmed the slowdown was present. The average 
`dag_processor_manager` timings were:
   ```
   Docker/1.10.11: 9.94s
   Docker/1.10.12: 27.84s
   pyenv/1.10.11: 9.56s
   pyenv/1.10.12: 24.67s
   ```
   
   
   
   ## How to reproduce it
   
   The setup I used to test the issue is here: 
[airflow-slowdown-example.zip](https://github.com/apache/airflow/files/5370865/airflow-slowdown-example.zip).
   
   
   It contains scripts to run Airflow via docker or directly, a randomly 
generated DAG, and a script to scrape and average the `dag_processor_manager` 
logs.
   
   To run the tests:
   Set the airflow version in the `Makefile` to `v1.10.11`.
   `make build-pyenv` or `make build-docker`.
   `make run-pyenv` or `make run-docker`.
   Open Airflow UI and run the DAG.
   Let it run for ~20mins.
   `make kill-pyenv` or `make kill-docker`.
   Repeat for `v1.10.12`.
   Run the `summarise_logs.py` script to scrape the logs and calculate average 
`dag_processor_manager` timings.
   
   
   **Anything else we need to know**:
   
   In the test environment, the DAG processing time drops to 2s, for both 
versions, when the DAG has no active DAG run instance. I think both the 
webserver and the scheduler invoke the DAG processor manager by default, so I 
wonder if it's related to the scheduler only?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to