Re: Bad mixing of decorated and classic operators (users shooting themselves in their foot)

Elad Kalif Sun, 10 Mar 2024 00:29:56 -0800

The issue here is not just about decorators it happens also with regular
operators (operator inside operator) and also with operator inside
on_x_callback


For example:
https://stackoverflow.com/questions/64291042/airflow-call-a-operator-inside-a-function/
https://stackoverflow.com/questions/67483542/airflow-pythonoperator-inside-pythonoperator/



> I can't see which problem is solved by allowing running one operator
inside another.

>From the user's perspective, they have an operator that knows how to do
something and it's very easy to use. So they want to leverage that.
For example send Slack message:

slack_operator_post_text = SlackAPIPostOperator(
        task_id="slack_post_text",
        channel=SLACK_CHANNEL,
        text=("My message"),
    )

It handles everything. Now if you want to send a Slack message from a
PythonOperator you need to initialize a hook, find the right function to
invoke etc.
Thus from the user perspective - There is already a class that does all
that. Why can't it just work? Why do they need to "reimplement" the
operator logic? (most of the time it will be copy paste the logic of the
execute function)

So, the problem they are trying to solve is to avoid code duplication and
ease of use.

Jarek - I think your solution focuses more on the templating side but I
think the actual problem is not limited to the templating.
I think the problem is more of "I know there is an operator that does X, so
I will just use it inside the python function I invoke from the python
operator" - regardless of whether Jinja/templating becomes an issue or not.

On Sat, Mar 9, 2024 at 9:06 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> I see that we have already (thanks David!) a PR:
> https://github.com/apache/airflow/pull/37937 to forbid this use (which is
> cool and I am glad my discussion had some ripple effect :D ).
>
> I am quite happy to get this one merged once it passes tests/reviews, but I
> would still want to explore future departure / options we might have, maybe
> there will be another - long term - ripple effect :). I thought a bit more
> about  - possibly - different reasons why this pattern we observe is
> emerging and I have a theory.
>
> To Andrey's comments:
>
> > I can't see which problem is solved by allowing running one operator
> inside another.
>
> For me, the main problem to solve is that using Hooks in the way I
> described in
>
> https://medium.com/apache-airflow/generic-airflow-transfers-made-easy-5fe8e5e7d2c2
> in 2022 are almost non-discoverable by significant percentage of users.
> Especially those kinds of users that mostly treat Airflow Operators as
> black-box and **just** discovered task flow as a way that they can do
> simple things in Python - but they are not into writing their own custom
> operators, nor look at the operator's code. Generally they don't really see
> DAG authoring as writing Python Code, it's mostly about using a little
> weird DSL to build their DAGs. Mostly copy&pasting some constructs that
> look like putting together existing building blocks and using patterns like
> `>>` to add dependencies.
>
> Yes I try to be empathetic and try to guess how such users think about DAG
> authoring - I might be wrong, but this is what I see as a recurring
> pattern.
>
> So in this context - @task is not Python code writing, it's yet another DSL
> that people see as appealing. And the case (Here I just speculate - so I
> might be entirely wrong) I **think** the original pattern I posted above
> solve is that people think that they can slightly improve the flexibility
> of the operators by adding a bit of simple code before when they need a bit
> more flexibility and JINJA is not enough. Basically replacing this
>
> operator = AnOperator(with_param='{{ here I want some dynamicness }}')
>
> with:
>
> @task
> def my_task():
>     calculated_param = calculate_the_param()  # do something more complex
> that is difficult to do with JINJA expression
>     operator = AnOperator(with_param=calculated_param)
>     operator.execute()
>
> And I **think** the main issue to solve here is how to make it a bit more
> flexible to get parameters of operators pre-calculated **just** before the
> execute() method
>
> This is speculation of course - and there might be different motivations -
> but I think addressing this need better - might be actually solving the
> problem (combined with David's PR). If we find a way to pass more complex
> calculations to parameters of operators?
>
> So MAYBE (just maybe) we could do something like that (conceptual - name
> might be different)
>
>
>
> operator=AnOperator(with_param=RunThisBeforeExecute(callable=calculate_the_param))
>
> And let the user use a callable there:
>
> def calculate_the_param(context: dict) -> Any
>
> I **think** we could extend our "rendering JINJA template" to handle this
> special case for templated parameters. Plus, it would nicely solve the
> "render_as_native" problem - because that method could return the expected
> object rather than string (and every parameter could have its own
> method....
>
> Maybe that would be a good solution ?
>
> J.
>
>
>
>
>
>
> On Sun, Mar 3, 2024 at 12:03 AM Daniel Standish
> <daniel.stand...@astronomer.io.invalid> wrote:
>
> > One wrinkle to the have cake and eat it too approach is deferrable
> > operators. It doesn't seem it would be very practical to resume back into
> > the operator that is nested inside a taskflow function.  One solution
> would
> > be to run the trigger in process like we currently do with `dag.test()`.
> > That would make it non-deferrable in effect.  But at least it would run
> > properly.  There may be other better solutions.
> >
>

Re: Bad mixing of decorated and classic operators (users shooting themselves in their foot)

Reply via email to