SamWheating commented on issue #18023: URL: https://github.com/apache/airflow/issues/18023#issuecomment-914490044
After further investigation, I think that most of the performance issues we saw were due to the issue now resolved by https://github.com/apache/airflow/pull/17945. If needed, I can run some experiments and test the affects of having thousands of `queued` DagRuns on scheduler latency, but I think it's not as severe as I once thought. However, I still think that its a good idea to implement a limit for queued DagRuns, as creating thousands of queued runs in advance (specifically in the case of DAGs with a much earlier start date and `catchup=True`) can lead to some weird behaviour: 1) If the `end_date` is changed to an earlier date, there may already be queued runs after that date which have already been created. 2) If the schedule interval is changed, the change will not affect already created queued Dagruns. 3) If there's a lifecycle policy on data in the DB, creating a DAGRun potentially weeks before it actually runs could cause the queued run to be dropped before it is ever run. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
