yuqian90 opened a new pull request #11184: URL: https://github.com/apache/airflow/pull/11184
This is an improvement to the UI response time when clearing dozens of DagRuns of large DAGs (thousands of tasks) containing many `ExternalTaskSensor` + `ExternalTaskMarker` pairs. In the current implementation, clearing tasks can get slow especially if the user chooses to clear with Future, Downstream and Recursive all selected. This PR speeds it up. There are two major improvements: - Updating `self._task_group` in `dag.sub_dag()` is improved to not deep copy `_task_group` because it's a waste of time. Instead, do something like `dag.task_dict`, set it to None first and then copy explicitly. - Pass the `TaskInstance` already visited down the recursive calls of `dag.clear()` as `visited_external_tis`. This speeds up the example in `test_clear_overlapping_external_task_marker` by almost five folds. For real large dags containing 500 tasks set up in a similar manner, the time it takes to clear 30 DagRun is cut from around 100s to less than 10s. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
