I am torn between 1) and 3). While 1) is tempting and it would simplify our state management, I think 3) is the safer choice. I think if we were not able to convert all our operators to be deferrable yet, there are probably many thousands of custom ones that will stop working if we remove that feature.
If we go 1) but then any operator that would go to "poke & reschedule", should just be "normal sensor" and simply start taking resources while waiting. Technically it's not "breaking" the flow, but it's likely breaking installation which heavily relies on rescheduling and it would dramatically increase resource usage. And there is no easy way out short of rewriting all such operators to support deferrable. I think personally that in making such decision we should consider two things: 1) will this stop some people from migrating to Airflow 3 because it will be "heavy operation"? 2) how likely we think it's going to happen - will we use big, important users who might be "success" stories for Airflow 3. I have no data to back it up, (maybe some people here could have it) - but my intuition tells me that: 1) yes it will stop some users from migrating to Airflow 3 (because they will either have to accept increased resource usage or find engineering time to rewrite their custom operators) 2) yeah, I think it's quite likely and quite likely big users that could be "Airflow 3 success story" might be affected But I am guessing. If someone could provide some data telling that either 1) or 2) assumption I made is false, I am happy to support option 1). For now it's 3). On Wed, Nov 13, 2024 at 2:43 PM Abhishek Bhakat <abhishek.bha...@astronomer.io.invalid> wrote: > +1 to 3. For cases where absolute minimal latency is critical, and worker > resources aren't constrained, poke mode could still be the optimal choice. > I don't see any value in reschedule mode anymore, deferrable should be the > default. > > On Wed, Nov 13, 2024 at 12:21 PM Kaxil Naik <kaxiln...@gmail.com> wrote: > > > There is 4th option to keep things as-is too :) > > > > On Wed, 13 Nov 2024 at 12:19, Kaxil Naik <kaxiln...@gmail.com> wrote: > > > > > Hi all, > > > > > > Following up on the Dev call discussions last Thursday, I am opening > this > > > up for discussion. > > > > > > Reschedule mode was introduced to improve efficiency over poke mode by > > > allowing tasks to wait without holding a worker slot. Since the > > > introduction of deferrable operators in Airflow 2.2, however, we now > have > > > an even more optimal, async-driven solution. The adoption of deferrable > > > operators has been really good, and since we are already chopping > things > > > off with Airflow 3 it might be time to consider making them the default > > > mode. > > > > > > This will ensure that our users always have the most optimal way of > > > running sensors by default and that we, the maintainers or folks > > supporting > > > Airflow deployments in companies, do not need to know different > > approaches > > > with Reschedule mode, either. > > > > > > However, not all sensors can be async, either due to limitations in > > > underlying libraries or a lack of unique ids for async polling. > > > > > > Knowing that we have a few options: > > > > > > 1) *Remove Poke & Reschedule modes* > > > > > > This is aggressive and it means we will have to remove things like > > > PostgresSensor that does not support async. > > > > > > 2) *Remove Reschedule mode * > > > > > > Make deferrable the primary mode, falling back to poke where async > isn’t > > > supported. > > > > > > 3) *Make Deferrable the default, keep Poke & Reschedule* > > > > > > This is a defensive option that maintains current behaviour but ensures > > > that we have the most performant option by default. It could be as > simple > > > as making AIRFLOW__OPERATORS__DEFAULT_DEFERRABLE default to True. > > > > > > I’d love to hear feedback, especially from users who rely on reschedule > > > mode today! > > > > > > Regards, > > > Kaxil > > > > > >