> Seems that option 2 gives the most while
still providing fallback for users - so why go straight from 1 to 3 if
1 looks impossible?

I think the problem with this is that while functionally it will work (i.e.
yes, there is fallback), it is not feasible and sustainable for many big
users. If you have a sensor running for 4 hours, taking up the whole python
interpreter running for the whole 4 runners and multiplying it by 1000
sensors waiting, you might suddenly start paying 5x for your infrastructure
to run Airflow. I can't remember exactly but I remember there were
scenarios where deferrable in some cases meant 10x less resource usage,
with rescheduling you can easily get 5x (I am guessing here - it might be
3x but the resource savings here are huge).

So with option 2) you basically you give users two options, both of them
quite expensive on top of "generic" Airflow 3 migration cost

a) rewrite their sensors to deferrable (Costly - you need engineering time
if you have say 100 custom sensors and 1000s of DAGs using them)
b) Start paying many times more for your infrastructure until you rewrite
those

If I was given only those two options, I'd choose option c) - stay with
Airflow 2 for as long as I can and do nothing

With option 2) hoewver, we can introduce a warning for reschedule mode and
direct the user, explaining to them that they can get even more resource
savings and allowing them to gradually rewrite their custom sensors to
deferrable versions.

J.









On Thu, Nov 14, 2024 at 10:36 AM Maciej Obuchowski <mobuchow...@apache.org>
wrote:

> Not sure why most people choose between 1 and 3, when the
> reschedule mode seems to be the biggest issue. Removing poke
> does not remove a lot of complexity - while removing reschedule
> simplifies the scheduler. Seems that option 2 gives the most while
> still providing fallback for users - so why go straight from 1 to 3 if
> 1 looks impossible?
>
> czw., 14 lis 2024 o 10:17 Igor Kholopov <ikholo...@google.com.invalid>
> napisał(a):
>
> > Option 1 is a non-starter from the compatibility standpoint. Not every
> > reschedulable sensor can be deferrable as the retriable check operation
> > itself can have arbitrary execution time.
> >
> > Option 2 is a way to go if we want to formally sunset sensors as the
> > dedicated mechanism. Option 3 is almost keeping status-quo with a hope
> that
> > eventually people will stick with a new default, if we go with it we need
> > to generate a very explicit deprecation warning for "reschedule" mode.
> >
> > Given that, I believe both Option 2 and 3 are acceptable. I guess we will
> > need to wait until Airflow 4 to fully remove reschedule mode.
> >
> > Regards,
> > Igor
> >
> > On Thu, Nov 14, 2024 at 12:31 AM Pavankumar Gopidesu <
> > gopidesupa...@gmail.com> wrote:
> >
> > > Great points. Maintaining both sync and async hooks is indeed
> > challenging,
> > > and making the triggerer mandatory in Airflow 3 seems like a sensible
> > > move to simplify the system.  However, for users heavily relying on
> > > the reschedule option,
> > > migrating to Airflow 3 could be difficult if this option isn’t
> > > available. So at this point I am more towards option 3.
> > >
> > > Regards,
> > > Pavan Kumar
> > >
> > > On Wed, Nov 13, 2024 at 10:13 PM Jarek Potiuk <ja...@potiuk.com>
> wrote:
> > > >
> > > > That's a good point Daniel but I think it's a bit more nuanced,
> > > >
> > > > The reschedule mode in community providers serves a purpose for
> Airflow
> > > 2.
> > > > If a deployment in Airflow 2 does not have Triggerer - then
> deferrable
> > > > version of the sensor will not work for it.
> > > > Triggerer was really optional in Airflow 2. I bet a number of people
> > who
> > > > migrated from Airflow 2.1 or 2.2 do not still have Triggerer running.
> > > >
> > > > We have not discussed it but I guess for Airflow 3 we should make
> > > Triggerer
> > > > mandatory - which - if we do it - indeed makes the "reschedule"
> option
> > > for
> > > > our providers redundant.
> > > >
> > > > And yes - I think it is valuable to remove rescheduler options for
> the
> > > > reasons Kaxil mentioned.
> > > >
> > > > But IMHO we should only do that when min-airflow-version >= 3.0
> (i.e. ~
> > > > October 2025), otherwise we are taking away promised provider
> features
> > > for
> > > > Airflow 2 users.
> > > >
> > > >
> > > > J.
> > > >
> > > > On Wed, Nov 13, 2024 at 10:38 PM Kaxil Naik <kaxiln...@gmail.com>
> > wrote:
> > > >
> > > > > Yeah, as a maintainer it is pain to have to maintain both sync and
> > > async
> > > > > version of a hook for a Sensor in providers
> > > > >
> > > > > On Wed, 13 Nov 2024 at 20:42, Daniel Standish
> > > > > <daniel.stand...@astronomer.io.invalid> wrote:
> > > > >
> > > > > > Yeah everything is related. I'm just saying for the purpose of
> > > picking a
> > > > > > path forward i'm just trying to clarify the options.
> > > > > >
> > > > > > If we can rule out any of the options I presented we'll have made
> > > some
> > > > > > progress.
> > > > > >
> > > > > > Do you have an opinion on any of those?
> > > > > >
> > > > > > I would probably lean towards keep reschedule interface, keep it
> on
> > > base
> > > > > > sensor, remove it from providers.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Nov 13, 2024 at 12:37 PM Kaxil Naik <kaxiln...@gmail.com
> >
> > > wrote:
> > > > > >
> > > > > > > > I think we need to separate what to do about the sensors we
> > have
> > > in
> > > > > > > > providers from the core interface, essentially.
> > > > > > >
> > > > > > >
> > > > > > > I think they are interconnected, especially if one of the goals
> > is
> > > to
> > > > > > have
> > > > > > > the "most performant" option (deferral) as the default or the
> > only
> > > > > option
> > > > > > > for the users, regardless of the internal.
> > > > > > >
> > > > > > > From the user's POV, they can care about performance and
> > visibility
> > > > > > > ("rescheduled"/"deferral" as TI states).
> > > > > > >
> > > > > > > The secondary issue is what you are mentioning about the
> > > implementation
> > > > > > of
> > > > > > > "reschedule" method and what to do when reschedule happens,
> IIRC
> > > none
> > > > > of
> > > > > > > the Operators in the "official" providers do anything special
> > then
> > > > > > > inheriting BaseSensorOperator.
> > > > > > >
> > > > > > > 1. remove the rescheduling interface in core.  No more "up for
> > > > > > reschedule"
> > > > > > > > state etc.  No more reschedule mode in base sensor. No more
> > > > > reschedule
> > > > > > > mode
> > > > > > > > in any derivative sensor.
> > > > > > > > 2. keep rescheduling in core. remove from base sensor and
> > > > > derivatives.
> > > > > > > > 3. keep reschedule in core. keep in base sensor. remove from
> > all
> > > > > > > > derivatives. ban the reintroduction of reschedule mode in
> > > derivative
> > > > > > > > sensors.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, 13 Nov 2024 at 19:46, Ash Berlin-Taylor <
> a...@apache.org>
> > > > > wrote:
> > > > > > >
> > > > > > > > > Reschedulability in airflow (making up a word here) to my
> > > knowledge
> > > > > > > only
> > > > > > > > exists to support reschedule mode in sensors.
> > > > > > > >
> > > > > > > > Yes, this is true.
> > > > > > > >
> > > > > > > >
> > > > > > > > On 13 November 2024 19:32:23 GMT, Daniel Standish
> > > > > > > > <daniel.stand...@astronomer.io.INVALID> wrote:
> > > > > > > > >The options are a bit muddled here.
> > > > > > > > >
> > > > > > > > >I think we need to separate what to do about the sensors we
> > > have in
> > > > > > > > >providers from the core interface, essentially.
> > > > > > > > >
> > > > > > > > >Reschedulability in airflow (making up a word here) to my
> > > knowledge
> > > > > > only
> > > > > > > > >exists to support reschedule mode in sensors.
> > > > > > > > >
> > > > > > > > >Let's also separate poke from reschedule.
> > > > > > > > >
> > > > > > > > >For now, let's just look at reschedule.
> > > > > > > > >
> > > > > > > > >Let me enumerate some paths as i see them.
> > > > > > > > >
> > > > > > > > >1. remove the rescheduling interface in core.  No more "up
> for
> > > > > > > reschedule"
> > > > > > > > >state etc.  No more reschedule mode in base sensor. No more
> > > > > reschedule
> > > > > > > > mode
> > > > > > > > >in any derivative sensor.
> > > > > > > > >2. keep rescheduling in core. remove from base sensor and
> > > > > derivatives.
> > > > > > > > >3. keep reschedule in core. keep in base sensor. remove from
> > all
> > > > > > > > >derivatives. ban the reintroduction of reschedule mode in
> > > derivative
> > > > > > > > >sensors.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > For additional commands, e-mail: dev-h...@airflow.apache.org
> > >
> > >
> >
>

Reply via email to