yuqian90 opened a new pull request #11184:
URL: https://github.com/apache/airflow/pull/11184


   This is an improvement to the UI response time when clearing dozens of 
DagRuns of large DAGs (thousands of tasks) containing many `ExternalTaskSensor` 
+ `ExternalTaskMarker` pairs. In the current implementation, clearing tasks can 
get slow especially if the user chooses to clear with Future, Downstream and 
Recursive all selected. 
   
   This PR speeds it up. There are two major improvements:
   
   - Updating `self._task_group` in `dag.sub_dag()` is improved to not deep 
copy `_task_group` because it's a waste of time. Instead, do something like 
`dag.task_dict`, set it to None first and then copy explicitly.
   - Pass the `TaskInstance` already visited down the recursive calls of 
`dag.clear()` as `visited_external_tis`. This speeds up the example in 
`test_clear_overlapping_external_task_marker` by almost five folds. 
   
   For real large dags containing 500 tasks set up in a similar manner, the 
time it takes to clear 30 DagRun is cut from around 100s to less than 10s.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to