I see that we have already (thanks David!) a PR:
https://github.com/apache/airflow/pull/37937 to forbid this use (which is
cool and I am glad my discussion had some ripple effect :D ).

I am quite happy to get this one merged once it passes tests/reviews, but I
would still want to explore future departure / options we might have, maybe
there will be another - long term - ripple effect :). I thought a bit more
about  - possibly - different reasons why this pattern we observe is
emerging and I have a theory.

To Andrey's comments:

> I can't see which problem is solved by allowing running one operator
inside another.

For me, the main problem to solve is that using Hooks in the way I
described in
https://medium.com/apache-airflow/generic-airflow-transfers-made-easy-5fe8e5e7d2c2
in 2022 are almost non-discoverable by significant percentage of users.
Especially those kinds of users that mostly treat Airflow Operators as
black-box and **just** discovered task flow as a way that they can do
simple things in Python - but they are not into writing their own custom
operators, nor look at the operator's code. Generally they don't really see
DAG authoring as writing Python Code, it's mostly about using a little
weird DSL to build their DAGs. Mostly copy&pasting some constructs that
look like putting together existing building blocks and using patterns like
`>>` to add dependencies.

Yes I try to be empathetic and try to guess how such users think about DAG
authoring - I might be wrong, but this is what I see as a recurring pattern.

So in this context - @task is not Python code writing, it's yet another DSL
that people see as appealing. And the case (Here I just speculate - so I
might be entirely wrong) I **think** the original pattern I posted above
solve is that people think that they can slightly improve the flexibility
of the operators by adding a bit of simple code before when they need a bit
more flexibility and JINJA is not enough. Basically replacing this

operator = AnOperator(with_param='{{ here I want some dynamicness }}')

with:

@task
def my_task():
    calculated_param = calculate_the_param()  # do something more complex
that is difficult to do with JINJA expression
    operator = AnOperator(with_param=calculated_param)
    operator.execute()

And I **think** the main issue to solve here is how to make it a bit more
flexible to get parameters of operators pre-calculated **just** before the
execute() method

This is speculation of course - and there might be different motivations -
but I think addressing this need better - might be actually solving the
problem (combined with David's PR). If we find a way to pass more complex
calculations to parameters of operators?

So MAYBE (just maybe) we could do something like that (conceptual - name
might be different)


operator=AnOperator(with_param=RunThisBeforeExecute(callable=calculate_the_param))

And let the user use a callable there:

def calculate_the_param(context: dict) -> Any

I **think** we could extend our "rendering JINJA template" to handle this
special case for templated parameters. Plus, it would nicely solve the
"render_as_native" problem - because that method could return the expected
object rather than string (and every parameter could have its own method....

Maybe that would be a good solution ?

J.






On Sun, Mar 3, 2024 at 12:03 AM Daniel Standish
<daniel.stand...@astronomer.io.invalid> wrote:

> One wrinkle to the have cake and eat it too approach is deferrable
> operators. It doesn't seem it would be very practical to resume back into
> the operator that is nested inside a taskflow function.  One solution would
> be to run the trigger in process like we currently do with `dag.test()`.
> That would make it non-deferrable in effect.  But at least it would run
> properly.  There may be other better solutions.
>

Reply via email to