AutomationDev85 opened a new pull request, #30704:
URL: https://github.com/apache/airflow/pull/30704

   Hi airflow community,
   this is my second PR and be happy to work on the scheduler runtime again. We 
faced an issue with slow scheduler execution time by having millions of queued 
dag_runs for one DAG. 
   
   This PR will add at 2 points in the code a caching of the dag. This saved a 
lot of scheduler runtime during scheduling many dag_runs for the same dag. The 
code currently reads the dag out of the DB and if you have a lot of short 
running tasks this is executed a lot. E.g. we wanted to schedule a DAG with:
   max_active_tasks=60,
   max_active_runs=180,
   and most tasks with an execution time of 2 sec and 1 million in queued 
state.  With the caching we were able to increase scheduler performance. 
Because the time on our slow DB to query the dag took between 50ms and 250ms 
and if you execute this only once or 60 times during one scheduler loop run 
this makes a big change.
   
   @vandonr-amz fyi, as discussed with @jens-scheffler-bosch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to