This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch v2-2-test in repository https://gitbox.apache.org/repos/asf/airflow.git
commit 7cd3fd68fbee97ab84420e70d3f17cd6a21b9e84 Author: Alan Ma <[email protected]> AuthorDate: Sun Jan 9 13:58:26 2022 -0800 Compare taskgroup and subdag (#20700) (cherry picked from commit 6b0c52898555641059e149c5ff0d9b46b2d45379) --- docs/apache-airflow/concepts/dags.rst | 43 +++++++++++++++++++++++++++++++++-- 1 file changed, 41 insertions(+), 2 deletions(-) diff --git a/docs/apache-airflow/concepts/dags.rst b/docs/apache-airflow/concepts/dags.rst index 8aa4955..8d9b387 100644 --- a/docs/apache-airflow/concepts/dags.rst +++ b/docs/apache-airflow/concepts/dags.rst @@ -605,8 +605,47 @@ Some other tips when using SubDAGs: See ``airflow/example_dags`` for a demonstration. -Note that :doc:`pools` are *not honored* by :class:`~airflow.operators.subdag.SubDagOperator`, and so -resources could be consumed by SubdagOperators beyond any limits you may have set. + +.. note:: + + Parallelism is *not honored* by :class:`~airflow.operators.subdag.SubDagOperator`, and so resources could be consumed by SubdagOperators beyond any limits you may have set. + + + +TaskGroups vs SubDAGs +---------------------- + +SubDAGs, while serving a similar purpose as TaskGroups, introduces both performance and functional issues due to its implementation. + +* The SubDagOperator starts a BackfillJob, which ignores existing parallelism configurations potentially oversubscribing the worker environment. +* SubDAGs have their own DAG attributes. When the SubDAG DAG attributes are inconsistent with its parent DAG, unexpected behavior can occur. +* Unable to see the "full" DAG in one view as SubDAGs exists as a full fledged DAG. +* SubDAGs introduces all sorts of edge cases and caveats. This can disrupt user experience and expectation. + +TaskGroups, on the other hand, is a better option given that it is purely a UI grouping concept. All tasks within the TaskGroup still behave as any other tasks outside of the TaskGroup. + +You can see the core differences between these two constructs. + ++--------------------------------------------------------+--------------------------------------------------------+ +| TaskGroup | SubDAG | ++========================================================+========================================================+ +| Repeating patterns as part of the same DAG | Repeating patterns as a separate DAG | ++--------------------------------------------------------+--------------------------------------------------------+ +| One set of views and statistics for the DAG | Separate set of views and statistics between parent | +| | and child DAGs | ++--------------------------------------------------------+--------------------------------------------------------+ +| One set of DAG configuration | Several sets of DAG configurations | ++--------------------------------------------------------+--------------------------------------------------------+ +| Honors parallelism configurations through existing | Does not honor parallelism configurations due to | +| SchedulerJob | newly spawned BackfillJob | ++--------------------------------------------------------+--------------------------------------------------------+ +| Simple construct declaration with context manager | Complex DAG factory with naming restrictions | ++--------------------------------------------------------+--------------------------------------------------------+ + +.. note:: + + SubDAG is deprecated hence TaskGroup is always the preferred choice. + Packaging DAGs
