zhiyong-ong commented on PR #27102:
URL: https://github.com/apache/airflow/pull/27102#issuecomment-1299602765
sure. the tests are all done by clearing downstream + recursive task
instances.
for a typical dag structure of clearing dag runs across multiple days we
have something like:
- same dag_id
- multiple run_ids (total 6)
- multiple task_ids (total 1005)
- same map_index
(this is quite a common scenario that we typically face when we try and
clear 1000+ tasks over multiple days for the same dag)
i'm able to get the following benchmarks
original:
```
[2022-11-01 07:53:20,353] {views.py:2017} INFO - Number of tasks to clear:
6029
[2022-11-01 07:53:20,354] {views.py:2018} INFO - Time taken:
42.42555727700528
```
new:
```
[2022-11-01 07:11:24,196] {views.py:2017} INFO - Number of tasks to clear:
6029
[2022-11-01 07:11:24,196] {views.py:2018} INFO - Time taken:
5.101936740000383
```
for a not so optimal case:
- multiple dag_ids (total 5)
- multiple run_ids (total 10)
- multiple task_ids (total 5)
- multiple map_index (total 100)
original:
```
[2022-11-02 05:34:41,283] {views.py:2017} INFO - Number of tasks to clear:
326
[2022-11-02 05:34:41,284] {views.py:2018} INFO - Time taken:
17.914417002000846
```
new:
```
[2022-11-02 05:35:44,062] {views.py:2017} INFO - Number of tasks to clear:
326
[2022-11-02 05:35:44,062] {views.py:2018} INFO - Time taken:
3.5351223750039935
```
(on a side note, there appears to be a bug for clearing mapped tasks on a
downstream dag. you can reproduce this by creating 2 dags, 1 dependent on the
other, with mapped tasks in the child dag. clearing the parent dag won't clear
the child dag's mapped tasks. its a small fix but do let me know if that's the
intended behaviour)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]