bbovenzi commented on code in PR #30129:
URL: https://github.com/apache/airflow/pull/30129#discussion_r1137451818
##########
airflow/models/dag.py:
##########
@@ -2215,29 +2215,29 @@ def _deepcopy_task(t) -> Operator:
def filter_task_group(group, parent_group):
"""Exclude tasks not included in the subdag from the given
TaskGroup."""
- copied = copy.copy(group)
- copied.used_group_ids = set(copied.used_group_ids)
- copied._parent_group = parent_group
-
- copied.children = {}
+ memo = {id(group.children): {}}
+ if parent_group:
+ memo[id(group.parent_group)] = parent_group
+ copied = copy.deepcopy(group, memo)
Review Comment:
Testing on a very large dag. I don't think this memo is working right. 127ms
to 22s...
Before:
<img width="802" alt="Screenshot 2023-03-15 at 12 58 21 PM"
src="https://user-images.githubusercontent.com/4600967/225384422-22b003c0-67d6-4219-8c7d-2e543f9f80c1.png">
After:
<img width="794" alt="Screenshot 2023-03-15 at 12 55 43 PM"
src="https://user-images.githubusercontent.com/4600967/225383299-c172fb50-30d0-4e55-b8e5-5bd224082203.png">
DAG:
```
from datetime import datetime
from airflow.models.dag import DAG
from airflow.operators.dummy import DummyOperator
from airflow.decorators import task_group
with DAG(
"wide_dummy",
schedule_interval=None,
start_date=datetime(2021, 1, 1),
catchup=True,
) as wide_dummy:
for i in range(100):
@task_group(group_id=f"group-{i}")
def group():
for t in range(10):
DummyOperator(task_id=f"out_{i}_{t}")
DummyOperator(task_id=f"out2_{i}_{t}")
group()
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]