Hi all,

I’m currently working on the [Synchronous Dag Execution] feature and trying to 
gather opinions on how the Taskflow API should work when we want to mark a task 
as the dag’s “result task” (i.e. “the return value is a final output of the 
dag, not an intermediate value”).

[Synchronous Dag Execution]: https://github.com/apache/airflow/issues/51711

## Prior art (kind of)

We currently have the setup/teardown Taskflow API like this:

    @setup
    def f1(): ...

    @task
    def f2(): ...

    setup1 = f1()  # This is a setup task.

    t2 = f2()  # This is a normal task.
    setup2 = t2.as_setup()  # This is a setup task.

A teardown variant also exists for both cases.

## The decorator syntax

The most straightforward syntax would be to have a @result decorator on a plain 
Python function. However, I don’t like this since a result task still has all 
the same arguments as a non-result task. Setup and teardown tasks don’t accept 
most task arguments. If @result needs to work on a plain function, it would 
need to duplicate and forward all the arguments on @task. I feel we can avoid 
this redundancy by requiring @result to be used ON TOP OF @task instead:

    @result
    @task(put your arguments here...)
    def f(): ...

We COULD also make using @result without @task a shorthand to argument-less 
calls (which is probably common?)

    # This...
    @result
    def f(): ...

    # Is equivalent to...
    @result
    @task
    def f(): ...

Alternatively, we could use a fluent interface:

    @task(arguments here...).result
    def f(): ...

Pro: avoids needing a top-level name. Con: Not a common pattern in Airflow.

## The method syntax

I don’t think adding a method similar to as_setup/teardown makes sense here. It 
makes sense for setup/teardown because it allows the same body of code to be 
BOTH a setup/teardown task AND a normal task at the same time, as shown above. 
This does not make sense for a result task—a task either returns the result, or 
it doesn’t. If we want a method-based syntax, it makes more sense to have a 
method on the dag:

    with DAG(...) as dag:
        @task
        def f():

        t = f()
        dag.add_result(t)

## For @dag decorator

One more syntax that only makes sense here is we can automatically detect the 
return value of an @dag-decorated function:

    @dag
    def my_dag():
        @task
        def f1(): ...

        @task
        def f2(v): ...

        result = f2(f1())

        return result  # Marks f2 as the result task!

---------------

Looking forward to hearing thoughts on the above, and more ideas on possible 
syntaxes.

TP




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to