I’m part of a group working on the implementation of AIP-52.  We would like
to update the community on some changes to the implementation approach, the
planned roadmap, and give an opportunity to provide feedback.

First though, let’s recap briefly what are the main benefits of adding
setup and teardown as concepts in Airflow:

   -

   By separating setup and teardown from "work" tasks, after the failure of
   a work task, we can stop the dag from proceeding (i.e. onto subsequent work
   tasks) while still allowing needed teardown operations to proceed (e.g.
   deleting a cluster).
   -

   This separation also lets us optionally *not* fail a dag run when
   perhaps the important work was completed successfully but a cleanup or
   teardown operation failed.
   -

   By associating work and setup tasks, we can clear the setups (and their
   respective teardowns) when clearing the work tasks.


After experimenting with some different implementation approaches and
reviewing and writing a lot of example dags, we developed some conclusions
that caused us to change course somewhat, while still fulfilling the
principal goals of the AIP.

Perhaps most importantly, we believe it is essential that our design
choices leave room for multiple setup and teardown tasks in a given task
group or dag.  Dags don’t tend to do just one thing.  In a dag there could
be many tasks requiring their own “setup” and “teardown”.  Similarly, a
single “work” task may itself require multiple “setup” and teardown tasks.
For obvious reasons, combining the work of multiple operators into a single
task is not advisable.  And, requiring a new task group for each thing
requiring a setup also has pitfalls: it conflicts with the task group’s use
case as an arbitrary logical grouping of tasks, and as a task mapping
tool.  So we believe it will be necessary to be able to support multiple
setups within a group, and moreover we believe it will be necessary to be
able to set dependencies between them.

With that in mind, the main change we’d like to share is that we now
require that users must specify the relationship between setup/teardown
tasks and “normal” tasks.  *(In the original proposal, users were not
required to set relationships between setup/teardown tasks and the other
tasks in the group.)*

So in the original AIP you could do this:

with TaskGroup("group1") as tg1:

   setup1 = my_setup("g1_setup") # a setup task

   work1 = my_work("g1_work1")
  work2 = my_work("g1_work2")
  work1 >> work2

   teardown1 = my_teardown("g1_teardown") # a teardown task

Then in effect you’d get setup1 >> work1 >> work2 >> teardown1.

Now we require you to set those relationships explicitly.  Otherwise, if
you were to add a setup2 and a teardown2, it would not be clear what the
task sequencing should be.  Apart from this, we believe being explicit is
important for readability, because unless you are careful with object
naming in your dag it may not be obvious that setup1 and teardown1 are not
“normal” tasks, and therefore it might appear that they are free to run in
parallel as roots of the group.

Looking further ahead, while some of the design decisions are not yet
finalized, we’d like to give you a fuller preview of where we see this
going and how it should work.

At a high level, our approach is to make setup and teardown much more like
“normal” tasks, able to be organized and combined with all the flexibility
that Airflow users are accustomed to.  The behavior is mainly governed by a
few simple rules:

   -

   A teardown task will run if its setup has completed successfully and its
   upstreams are done.
   -

   The setup tasks “required by” a work task will be cleared when the work
   task is cleared.


When using multiple setups and teardowns, you will need to specify which
setup is for which teardown.  And the setup task “required by” a work task
can be inferred by its location between a setup and its teardown.

OK – any more detail would be too much for one email.  If you are
interested in reviewing our progress in greater detail and making comment,
you may review our working draft update here (
https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+DRAFT+updates+to+AIP-52).
We’ve added lots of examples with graph screenshots to help illustrate the
behavior, and there’s some discussion of the ways it differs from the
original.


Thanks for your consideration.


Daniel

Reply via email to