I am torn between 1) and 3). While 1) is tempting and it would simplify our
state management, I think 3) is the safer choice. I think if we were not
able to convert all our operators to be deferrable yet, there are probably
many thousands of custom ones that will stop working if we remove that
feature.

If we go 1) but then any operator that would go to "poke & reschedule",
should just be "normal sensor" and simply start taking resources while
waiting. Technically it's not "breaking" the flow, but it's likely breaking
installation which heavily relies on rescheduling and it would dramatically
increase resource usage. And there is no easy way out short of rewriting
all such operators to support deferrable.

I think personally that in making such decision we should consider two
things:

1) will this stop some people from migrating to Airflow 3 because it will
be "heavy operation"?
2) how likely we think it's going to happen - will we use big, important
users who might be "success" stories for Airflow 3.

I have no data to back it up, (maybe some people here could have it) - but
my intuition tells me that:

1) yes it will stop some users from migrating to Airflow 3 (because they
will either have to accept increased resource usage or find engineering
time to rewrite their custom operators)
2) yeah, I think it's quite likely and quite likely big users that could be
"Airflow 3 success story" might be affected

But I am guessing. If someone could provide some data telling that either
1) or 2) assumption I made is false, I am happy to support option 1). For
now it's 3).



On Wed, Nov 13, 2024 at 2:43 PM Abhishek Bhakat
<abhishek.bha...@astronomer.io.invalid> wrote:

> +1 to 3. For cases where absolute minimal latency is critical, and worker
> resources aren't constrained, poke mode could still be the optimal choice.
> I don't see any value in reschedule mode anymore, deferrable should be the
> default.
>
> On Wed, Nov 13, 2024 at 12:21 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
>
> > There is 4th option to keep things as-is too :)
> >
> > On Wed, 13 Nov 2024 at 12:19, Kaxil Naik <kaxiln...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > Following up on the Dev call discussions last Thursday, I am opening
> this
> > > up for discussion.
> > >
> > > Reschedule mode was introduced to improve efficiency over poke mode by
> > > allowing tasks to wait without holding a worker slot. Since the
> > > introduction of deferrable operators in Airflow 2.2, however, we now
> have
> > > an even more optimal, async-driven solution. The adoption of deferrable
> > > operators has been really good, and since we are already chopping
> things
> > > off with Airflow 3 it might be time to consider making them the default
> > > mode.
> > >
> > > This will ensure that our users always have the most optimal way of
> > > running sensors by default and that we, the maintainers or folks
> > supporting
> > > Airflow deployments in companies, do not need to know different
> > approaches
> > > with Reschedule mode, either.
> > >
> > > However, not all sensors can be async, either due to limitations in
> > > underlying libraries or a lack of unique ids for async polling.
> > >
> > > Knowing that we have a few options:
> > >
> > > 1) *Remove Poke & Reschedule modes*
> > >
> > > This is aggressive and it means we will have to remove things like
> > > PostgresSensor that does not support async.
> > >
> > > 2) *Remove Reschedule mode *
> > >
> > > Make deferrable the primary mode, falling back to poke where async
> isn’t
> > > supported.
> > >
> > > 3) *Make Deferrable the default, keep Poke & Reschedule*
> > >
> > > This is a defensive option that maintains current behaviour but ensures
> > > that we have the most performant option by default. It could be as
> simple
> > > as making  AIRFLOW__OPERATORS__DEFAULT_DEFERRABLE default to True.
> > >
> > > I’d love to hear feedback, especially from users who rely on reschedule
> > > mode today!
> > >
> > > Regards,
> > > Kaxil
> > >
> >
>

Reply via email to