Just to add to the discussion - a discussion raised today
https://github.com/apache/airflow/discussions/38311  where the user is sure
that they can use operators in such a way as described above, and even used
the term "nested operator".

I think getting  https://github.com/apache/airflow/pull/37937  in will be a
good way in the future to prevent this misunderstanding, but maybe there is
something to think about - in the "Operators need to die" context by Bolke.

BTW. I have a hypothesis why those questions started to appear frequently
and people being reasonably sure they can do it. It's a pure speculation
(and I asked the user this time to explain) but some of that might be
fuelled by Chat GPT hallucinating about Airflow being able to do it. I saw
similar hallucinations before - where people suggested some (completely
wrong like that) solution to their problem and only after inquiry, they
admitted that it was a solution that ChatGPT gave them

I wonder if we have even more of those soon.

J.


On Sun, Mar 10, 2024 at 9:29 AM Elad Kalif <elad...@apache.org> wrote:

> The issue here is not just about decorators it happens also with regular
> operators (operator inside operator) and also with operator inside
> on_x_callback
>
> For example:
>
> https://stackoverflow.com/questions/64291042/airflow-call-a-operator-inside-a-function/
>
> https://stackoverflow.com/questions/67483542/airflow-pythonoperator-inside-pythonoperator/
>
>
>
> > I can't see which problem is solved by allowing running one operator
> inside another.
>
> From the user's perspective, they have an operator that knows how to do
> something and it's very easy to use. So they want to leverage that.
> For example send Slack message:
>
> slack_operator_post_text = SlackAPIPostOperator(
>         task_id="slack_post_text",
>         channel=SLACK_CHANNEL,
>         text=("My message"),
>     )
>
> It handles everything. Now if you want to send a Slack message from a
> PythonOperator you need to initialize a hook, find the right function to
> invoke etc.
> Thus from the user perspective - There is already a class that does all
> that. Why can't it just work? Why do they need to "reimplement" the
> operator logic? (most of the time it will be copy paste the logic of the
> execute function)
>
> So, the problem they are trying to solve is to avoid code duplication and
> ease of use.
>
> Jarek - I think your solution focuses more on the templating side but I
> think the actual problem is not limited to the templating.
> I think the problem is more of "I know there is an operator that does X, so
> I will just use it inside the python function I invoke from the python
> operator" - regardless of whether Jinja/templating becomes an issue or not.
>
> On Sat, Mar 9, 2024 at 9:06 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> > I see that we have already (thanks David!) a PR:
> > https://github.com/apache/airflow/pull/37937 to forbid this use (which
> is
> > cool and I am glad my discussion had some ripple effect :D ).
> >
> > I am quite happy to get this one merged once it passes tests/reviews,
> but I
> > would still want to explore future departure / options we might have,
> maybe
> > there will be another - long term - ripple effect :). I thought a bit
> more
> > about  - possibly - different reasons why this pattern we observe is
> > emerging and I have a theory.
> >
> > To Andrey's comments:
> >
> > > I can't see which problem is solved by allowing running one operator
> > inside another.
> >
> > For me, the main problem to solve is that using Hooks in the way I
> > described in
> >
> >
> https://medium.com/apache-airflow/generic-airflow-transfers-made-easy-5fe8e5e7d2c2
> > in 2022 are almost non-discoverable by significant percentage of users.
> > Especially those kinds of users that mostly treat Airflow Operators as
> > black-box and **just** discovered task flow as a way that they can do
> > simple things in Python - but they are not into writing their own custom
> > operators, nor look at the operator's code. Generally they don't really
> see
> > DAG authoring as writing Python Code, it's mostly about using a little
> > weird DSL to build their DAGs. Mostly copy&pasting some constructs that
> > look like putting together existing building blocks and using patterns
> like
> > `>>` to add dependencies.
> >
> > Yes I try to be empathetic and try to guess how such users think about
> DAG
> > authoring - I might be wrong, but this is what I see as a recurring
> > pattern.
> >
> > So in this context - @task is not Python code writing, it's yet another
> DSL
> > that people see as appealing. And the case (Here I just speculate - so I
> > might be entirely wrong) I **think** the original pattern I posted above
> > solve is that people think that they can slightly improve the flexibility
> > of the operators by adding a bit of simple code before when they need a
> bit
> > more flexibility and JINJA is not enough. Basically replacing this
> >
> > operator = AnOperator(with_param='{{ here I want some dynamicness }}')
> >
> > with:
> >
> > @task
> > def my_task():
> >     calculated_param = calculate_the_param()  # do something more complex
> > that is difficult to do with JINJA expression
> >     operator = AnOperator(with_param=calculated_param)
> >     operator.execute()
> >
> > And I **think** the main issue to solve here is how to make it a bit more
> > flexible to get parameters of operators pre-calculated **just** before
> the
> > execute() method
> >
> > This is speculation of course - and there might be different motivations
> -
> > but I think addressing this need better - might be actually solving the
> > problem (combined with David's PR). If we find a way to pass more complex
> > calculations to parameters of operators?
> >
> > So MAYBE (just maybe) we could do something like that (conceptual - name
> > might be different)
> >
> >
> >
> >
> operator=AnOperator(with_param=RunThisBeforeExecute(callable=calculate_the_param))
> >
> > And let the user use a callable there:
> >
> > def calculate_the_param(context: dict) -> Any
> >
> > I **think** we could extend our "rendering JINJA template" to handle this
> > special case for templated parameters. Plus, it would nicely solve the
> > "render_as_native" problem - because that method could return the
> expected
> > object rather than string (and every parameter could have its own
> > method....
> >
> > Maybe that would be a good solution ?
> >
> > J.
> >
> >
> >
> >
> >
> >
> > On Sun, Mar 3, 2024 at 12:03 AM Daniel Standish
> > <daniel.stand...@astronomer.io.invalid> wrote:
> >
> > > One wrinkle to the have cake and eat it too approach is deferrable
> > > operators. It doesn't seem it would be very practical to resume back
> into
> > > the operator that is nested inside a taskflow function.  One solution
> > would
> > > be to run the trigger in process like we currently do with
> `dag.test()`.
> > > That would make it non-deferrable in effect.  But at least it would run
> > > properly.  There may be other better solutions.
> > >
> >
>

Reply via email to