AutomationDev85 opened a new pull request, #30704: URL: https://github.com/apache/airflow/pull/30704
Hi airflow community, this is my second PR and be happy to work on the scheduler runtime again. We faced an issue with slow scheduler execution time by having millions of queued dag_runs for one DAG. This PR will add at 2 points in the code a caching of the dag. This saved a lot of scheduler runtime during scheduling many dag_runs for the same dag. The code currently reads the dag out of the DB and if you have a lot of short running tasks this is executed a lot. E.g. we wanted to schedule a DAG with: max_active_tasks=60, max_active_runs=180, and most tasks with an execution time of 2 sec and 1 million in queued state. With the caching we were able to increase scheduler performance. Because the time on our slow DB to query the dag took between 50ms and 250ms and if you execute this only once or 60 times during one scheduler loop run this makes a big change. @vandonr-amz fyi, as discussed with @jens-scheffler-bosch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
