bbovenzi commented on issue #26059: URL: https://github.com/apache/airflow/issues/26059#issuecomment-1468742220
Did a bunch of testing. To replicate: clear a any task in a task group that contains a join node. In our example_dags, `example_task_group.section_2.task_1` shows this well. The join node will disappear after clearing the task and since the node is missing we cannot connect the two task groups anymore and the graph breaks. This happens because clear calls `dag.partial_subset()` and that function is not properly copying the `dag.task_group` in its memo [here](https://github.com/apache/airflow/blob/main/airflow/models/dag.py#L2183). Removing the task_group memo and calling `filter_task_group` [here](https://github.com/apache/airflow/blob/main/airflow/models/dag.py#L2240) with the copied `dag.task_group` instead of `self.task_group`. It works, but is significantly slower for large dags (2000+ tasks) I'm not quite sure the best way to fix our deep copy memo. We do use `partial_subset` for filtering upstream/downstream too and you can replicate that way too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
