Hi, would like to clarify, in this thread we're specifically hoping to get
community feedback on the proposal to drop the "implicit" logic.

In the original AIP, if you instantiate a setup task in a group, in effect
it's made the setup task for all tasks in the group.  And the proposal up
for discussion here is that we don't make that assumption.

So for example in the original AIP there's this example:

from airflow import DAG, task, setup, teardown


with DAG(dag_id='test'):
    @setup
    def create_cluster():
        ...
        return cluster_id

    @task
    def load(ti):
        cluster_id = ti.xcom_pull(task_id="create_cluster")

    def summarize():
        ...

    @teardown(on_failure_fail_dagrun=False)
    def teardown_cluster():
        ...
       cluster_id = ti.xcom_pull(task_id="create_cluster")

    create_cluster()
    load() >> summarize()
    teardown_cluster()

*Proposal: require to set deps always*

The proposal for discussion is that we require you to set the dependency :

create_cluster() >> load() >> summarize() >> teardown_cluster()

And the reason is both for compatiibility with multiple setup and teardown
tasks, as well as greater flexibility even in the case of one setup and
teardown pair. So for example it might be that your "setup" task isn't the
first thing in the group.  E.g.

with TaskGroup() as tg:
    send_some_email >> create_cluster >> run_query >> delete_cluster
    create_cluster >> delete_cluster

Here create cluster is the setup (for the query to run) and delete cluster
is the teardown, and sending the email just happens first.

And then when thinking ahead to a world of multiple setup and teardown
tasks, it becomes more important to specify deps.  Maybe some steps need to
run in sequence, maybe others need to run in parallel.

*Alternative 1: allow implicit in the simple case*

We can try and have it both ways. So, continue to allow users to have
"implicit" setups and teardowns, but only in the *simple* case, i.e. when
there's only one setup and teardown in the group, and when no upstream /
downstream has been set to it.

So for example here we'd allow you to not arrow the setup:

with TaskGroup() as tg:
    my_setup()
    my_task1() >> my_task2()
    my_teardown()

And Airflow would automatically add the relationship my_setup() >>
my_task1() >> my_task2() >> my_teardown().

But if there's more than one setup we'd force you to wire up the deps.  So
this would either be strictly disallowed or would simply not work as
expected:

with TaskGroup() as tg:
    my_setup1()
    my_setup2()
    my_task1() >> my_task2()
    my_teardown1()
    my_teardown2()

*Alternative 2: only implicit*

This alternative is essentially a complete rejection of the proposal.  You
can't set deps between /  among setup tasks / teardown tasks / normal
tasks.

So this would be permitted and work as expected:

with TaskGroup() as tg:
    my_setup1()
    my_task1() >> my_task2()
    my_teardown2()

This would not be permitted:

with TaskGroup() as tg:
    my_setup1() >> my_task1() >> my_task2() >> my_teardown2()

And then if multiple setups / teardowns were ever to be supported, either
you wouldn't be able to sequence the setups and teardowns, or we'd have to
come up with some other way to define their sequencing aside from `>>`.
And there would be no possibility of having setup tasks or teardown tasks
anywhere but at the start or end of the group.

We welcome your thoughts.  Thanks

Reply via email to