dstandish commented on code in PR #32169:
URL: https://github.com/apache/airflow/pull/32169#discussion_r1258851711
##########
docs/apache-airflow/core-concepts/dags.rst:
##########
@@ -484,6 +484,125 @@ You can also combine this with the
:ref:`concepts:depends-on-past` functionality
.. image:: /img/branch_with_trigger.png
+.. _concepts:setup-and-teardown:
+
+Setup and Teardown
+~~~~~~~~~~~~~~~~~~
+
+In data workflows it's common to create resources (such as a compute
resource), use it to do some work, and then tear it down. Airflow provides
setup and teardown tasks to support this need.
+
+Key features of setup and teardown tasks:
+
+ * If you clear a task, its setups and teardowns will be cleared.
+ * By default, teardown tasks are ignored for the purpose of evaluating dag
run state.
+ * A teardown task will run if it's setup was successful, even if its work
tasks failed.
+ * Teardown tasks are ignored when setting dependencies against task groups.
+ * A setup task must always have a teardown and vice versa. You may use
EmptyOperator as a setup or teardown.
+
+Basic usage
+"""""""""""
+
+Suppose you have a dag that creates a cluster, runs a query, and deletes the
cluster. Without using setup and teardown tasks you might set these
relationships:
+
+.. code-block:: python
+
+ create_cluster >> run_query >> delete_cluster
+
+We can use the ``as_teardown`` let airflow know that ``create_cluster`` is a
setup task and ``delete_cluster`` is its teardown:
+
+.. code-block:: python
+
+ create_cluster >> run_query >>
delete_cluster.as_teardown(setups=create_cluster)
+
+Observations:
+
+ * If you clear ``run_query`` to run it again, then both ``create_cluster``
and ``delete_cluster`` will be cleared.
+ * If ``run_query`` fails, then ``delete_cluster`` will still run.
+ * The success of the dag run will depend on the success of ``run_query``.
+
+Setup "scope"
+"""""""""""""
+
+We require that a setup always have a teardown in order to have a well-defined
scope. If you wish to only add a teardown task or only a setup task, you may
use EmptyOperator as your "empty setup" or "empty teardown".
+
+The "scope" of a setup will be determined by where the teardown is. Tasks
between a setup and its teardown are in the "scope" of the setup / teardown
pair. Example:
+
+.. code-block:: python
+
+ s1 >> w1 >> w2 >> t1.as_teardown(setups=s1) >> w3
+ w2 >> w4
+
+In the above example, w1 and w2 are "between" s1 and t1 and therefore are
assumed to require s1. Thus if w1 or w2 is cleared, so too will be s1 and t1.
But if w3 or w4 is cleared, neither s1 nor t1 will be cleared.
+
+Controlling dag run state
+"""""""""""""""""""""""""
+
+Another feature of setup / teardown tasks is you can choose whether or not the
teardown task should have an impact on dag run state. Perhaps you don't care
if the "cleanup" work performed by your teardown task fails, and you only
consider the dag run a failure if the "work" tasks fail. By default, teardown
tasks are not considered for dag run state.
+
+Continuing with the example above, if you want the run's success to depend on
``delete_cluster``. Then set property ``on_failure_fail_dagrun=True`` when
setting ``delete_cluster`` as teardown:
+
+.. code-block:: python
+
+ create_cluster >> run_query >>
delete_cluster.as_teardown(setups=create_cluster, on_failure_fail_dagrun=True)
+
+Authoring with task groups
+""""""""""""""""""""""""""
+
+When arrowing from task group to task group, or from task group to task, we
ignore teardowns. This allows teardowns to run in parallel, and allows dag
execution to proceed even if teardown tasks fail.
+
+Consider this example:
+
+.. code-block:: python
+
+ with TaskGroup("my_group") as tg:
+ s1 = my_setup()
+ w1 = my_work()
+ t1 = my_teardown()
+ s1 >> w1 >> t1.as_teardown(setups=s1)
+ w2 = other_work()
+ tg >> w2
+
+If ``t1`` were not a teardown task, then this dag would effectively be ``s1 >>
w1 >> t1 >> w2``. But since we have marked ``t1`` as a teardown, it's ignored
in ``tg >> w2``. So the dag is equivalent to the following:
+
+.. code-block:: python
+
+ s1 >> w1 >> [t1.as_teardown(setups=s1), w2]
+
+Now let's consider an example with nesting:
+
+.. code-block:: python
+
+ with TaskGroup("my_group") as tg:
+ s1 = my_setup()
+ w1 = my_work()
+ t1 = my_teardown()
+ s1 >> w1 >> t1.as_teardown(setups=s1)
+ w2 = other_work()
+ tg >> w2
+ dag_s1 = dag_setup1()
+ dag_t1 = dag_teardown1()
+ dag_s1 >> [tg, w2] >> dag_t1.as_teardown(dag_s1)
+
+In this example s1 is downstream of dag_s1, so it must wait for dag_s1 to
complete successfully. But t1 and dag_t1 can run concurrently, because t1 is
ignored in the expression ``tg >> dag_t1``. If you clear w1, it will clear
dag_s1 and dag_t1, but not anything in the task group.
+
+Setup / teardown context manager
Review Comment:
Yeah so the problem is that `with s >> t: w` is not a direct replacement for
`s >> w >> t.as_teardown(setups=s)`
In order to use the context manager you must have already marked the tasks
as setup or teardowns.
So in order to give a workable example of `with s >> t` you must first
explain how to mark the tasks as setup or teardown.
And the way you do that is with `as_setup` and `as_teardown`. So I think
these have to come first I think. BUT... the real value of the context manager
comes with (and is more easily demonstrated) when you have *multiple* tasks,
perhaps with complicated dependencies, which you want to surround with a setup
/ teardown pair. And so what I have done is move the setup context manager up
higher, immediately after introducing these two methods, and I changed it to
wrap more "interesting" set of tasks.
Further, one thing I realized is that, one thing that has really bothered me
about the context manager is the fact that it appears to be ambiguous because
it seems like the `>>` is meaningful but in reality it is not. So by that i
mean that with `with a >> b:` you tend to think that the `a >>` is doing some
work in the production of the context manager but really it's not -- the
context manager is on `b` and that's it. And you must have already marked a
and b as setup and teardown by the time you get to context manager. And when
doing that you might as well mark the association between setup and teardown.
Let me illustrate:
Perhaps the example allowing to demonstrate `with s >> t` is this:
```python
s = MySetupOp().as_setup()
t = MyTeardownOp().as_teardown()
with s >> t:
w = MyWorkOp()
```
But equally we could do this:
```python
s = MySetupOp().as_setup()
t = MyTeardownOp().as_teardown(setups=s)
with t:
w = MyWorkOp()
```
So in this sense, the `s >>` in `with s >> t` is superfluous.
So I think that the best example is something like this:
```python
with delete_cluster.as_teardown(setups=create_cluster):
do_some_work() >> do other_work()
[some_stuff(), other_stuff()] >> more_stuff()
```
Thus the context manager is articulated more being in the context of a
teardown (and the scope it implies) rather than being in the context of the
binary implied by `with s >> t` which is not actually well defined and invites
odd phenomena. It's actually misleading in a way that something like `with s /
t` (or some other syntax or function) would *not* be. Because e.g. suppose you
had `s1 >> t` and `s2 >> t` and then you simply did `with s1 >> t: a()`. Then
what you would actually get is `s1 >> a(); s2 >> a(); a() >> t` -- so,
including the `s1 >>` is both superfluous and potentially misleading. And as a
result I think including the arrows in the context mgr expression should
probably be regarded as a bad practice.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]