Re: [DISCUSSION] Replace Poke & Reschedule mode from Sensors for Airflow 3 in favor of Deferrable

Vikram Koka Tue, 14 Jan 2025 12:06:41 -0800

Thanks Kaxil, I agree with that conclusion.

Thanks everyone for participating in the discussion.



On Tue, Jan 14, 2025 at 11:46 AM Kaxil Naik <[email protected]> wrote:

> Thanks everyone for the discussion. The concerns raised about deferrable
> are valid and given the time and scope we already have for 3.0, let's defer
> this to post 3.0.
>
> Let's defer the defer conversation to 3.x :)
>
> tldr: Things will be kept as-is!
>
> Regards,
> Kaxil
>
> On Tue, 26 Nov 2024 at 07:05, Andrew Gibbs <[email protected]> wrote:
>
> > TL;DR: poke, reschedule and deferrable each have their own use cases and
> > trade offs. Please don't get rid of any of them - they provide much
> needed
> > flexibility.
> >
> > One other problem with deferrable is that the capacity of the triggerer
> is
> > finite. AIUI, it selects N sensors to run, and runs them. The N+1'th
> sensor
> > is not picked up until the triggerer is done with at least one of the
> > original sensors.
> >
> > Reschedule sensors on a dedicated queue can round robin, with essentially
> > infinite capacity. If you have a lot (like 1k+) of sensor instances which
> > are each fast to run (e.g. S3 sensors, or a REST poll) and you don't care
> > about individual latency so long as they each get a turn in the sun, then
> > deferrable is not going to scale as you need to add more and more
> > triggerers to ensure that every sensor is covered.
> >
> > If you have a small / fixed number of sensors, deferrable can be great.
> If
> > you have a dynamic set up where the number of sensors can easily change,
> > triggerer can cause problems / require ever more resources.
> >
> > There is a break even point where running a couple of workers and having
> a
> > looooong tail of queued sensors uses less resources than the scheduler.
> >
> > As was pointed out earlier, poke is great for low latency but anathema to
> > cloud based set ups, due to the cost of running a dedicated process.
> >
> > Thanks,
> > Andrew
> >
> >
> >
> > On Mon, 25 Nov 2024, 18:46 Daniel Standish,
> > <[email protected]> wrote:
> >
> > > And, I have updated it a bit.  Ready for review in case anyone wants to
> > > have a look.  Basically, after this pr, to have sensor timeout
> enabled, a
> > > deferrable sensor just needs to ensure that when it defers, it passes
> its
> > > timeout to the trigger.  Then base sensor will detect when the trigger
> > > ended in a timeout, and will reraise as airflow sensor timeout, and
> fail
> > > immediately. To keep the PR focused, it does not change all existing
> > > sensors; just one.  But it will be pretty straightforward to do the
> rest
> > of
> > > them.
> > >
> > >
> > > On Fri, Nov 22, 2024 at 2:01 PM Daniel Standish <
> > > [email protected]> wrote:
> > >
> > > > Alright, i can take a hint
> > > >
> > > > rebased https://github.com/apache/airflow/pull/33718
> > > >
> > > > On Wed, Nov 20, 2024 at 6:23 PM Jarek Potiuk <[email protected]>
> wrote:
> > > >
> > > >> Also another good case that just happened -
> > > >> https://github.com/apache/airflow/discussions/44211  a user added a
> > > very
> > > >> valid question on overloading triggerers (i.e. trigger scalability).
> > If
> > > we
> > > >> are able to answer the question of the user now - this would be
> great
> > (I
> > > >> can't).
> > > >>
> > > >> But I think the answer won't be pretty (i.e. we can't help with that
> > > now)
> > > >> -
> > > >> so my question is how confident is that we can support such user's
> > > >> questions and how confident current scalability promises for example
> > can
> > > >> be
> > > >> met ?
> > > >>
> > > >> J.
> > > >>
> > > >>
> > > >> On Wed, Nov 20, 2024 at 11:11 PM Jarek Potiuk <[email protected]>
> > wrote:
> > > >>
> > > >> > Actually that's an extremely good point and I agree with Elad
> here.
> > > >> >
> > > >> > If we commit to something as first priority, we should treat
> issues
> > we
> > > >> > have with it with priority.
> > > >> > There is another - very similar - issue with on_kill handling for
> > > >> > deferrables that I just recalled.
> > > >> >
> > > >> > The issue is here: https://github.com/apache/airflow/issues/36090
> > > that
> > > >> > Matthew Wicks has been following on multiple times, and proposed a
> > way
> > > >> to
> > > >> > approach it in general way - but the only solution we were able to
> > > come
> > > >> up
> > > >> > with was to implement workaround in a few deferrable operators.
> > > >> >
> > > >> > I am here with Elad that until we solve this and prioritise other
> > > issues
> > > >> > there, making deferrable a default is a bad idea.
> > > >> >
> > > >> > J.
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Wed, Nov 20, 2024 at 6:56 PM Elad Kalif <[email protected]>
> > > wrote:
> > > >> >
> > > >> >> I don't think deferrable can act as a replacement at this point.
> > > >> >> Removing/changing defaults of poke and reschedule in favor of
> > > >> deferrable
> > > >> >> means that deferrable becomes a critical feature. This means that
> > any
> > > >> bug
> > > >> >> related to it must be triaged and fixed ASAP.
> > > >> >> We are not there yet. I don't think many maintainers know this
> area
> > > of
> > > >> the
> > > >> >> code.
> > > >> >>
> > > >> >> In my perspective as a user I have a functionality that is
> working
> > > >> fine. I
> > > >> >> learned to trust it. I know how it behaves and am able to write
> > code
> > > >> >> efficiently with it. The suggested replacement is a feature that
> > has
> > > >> 30ish
> > > >> >> bug reports with little community focus. That is a lot of risk
> with
> > > >> small
> > > >> >> profit to gain. For me this is an indication of -1 to this
> > proposal.
> > > >> >>
> > > >> >> For example issue https://github.com/apache/airflow/issues/36734
> > > with
> > > >> >> suggested PR: https://github.com/apache/airflow/pull/33718
> > > >> >> I admit I am not into the details but if we accept this proposal
> > this
> > > >> bug
> > > >> >> becomes top priority problem and this is open since 2023.
> > > >> >>
> > > >> >> So my position is -1 for any change about poke and reschedule
> (even
> > > for
> > > >> >> changing defaults).
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> On Thu, Nov 14, 2024 at 10:28 PM Jarek Potiuk <[email protected]>
> > > >> wrote:
> > > >> >>
> > > >> >> > > I think that if we have implemented async for an operator, we
> > > >> should
> > > >> >> > remove
> > > >> >> > the non-async version.
> > > >> >> >
> > > >> >> > I think we all agree here. The question is only "when". I see
> two
> > > >> >> options:
> > > >> >> >
> > > >> >> > 1) When Airflow 3 is released
> > > >> >> > 2) When Airflow 2 stops being supported
> > > >> >> >
> > > >> >> > I'd be for 2) .
> > > >> >> >
> > > >> >> > Why? Because it has the least impact on our users while it does
> > not
> > > >> make
> > > >> >> > our life much harder, while we might commit to clean-up
> > eventually.
> > > >> >> >
> > > >> >> > I personally (with my ADHD personality twist) often have an
> urge
> > to
> > > >> fix
> > > >> >> > things "now". It's very rewarding and has great "feedback" of
> > > feeling
> > > >> >> > better. But I think the next best thing is to have a clear
> > timeline
> > > >> to
> > > >> >> > remove things like that and relentlessly follow it through. And
> > it
> > > >> has
> > > >> >> the
> > > >> >> > added benefits of avoiding surprises for users who follow our
> > > >> decisions
> > > >> >> and
> > > >> >> > policies.
> > > >> >> >
> > > >> >> > I think the biggest problem I have with things that are
> > deprecated
> > > >> are
> > > >> >> not
> > > >> >> > that they are lying around, but with the feeling that they will
> > > stay
> > > >> >> there
> > > >> >> > forever (aka - we will not clean our technical debt). I think
> > > having
> > > >> a
> > > >> >> > policy when we remove things and following that, avoids that
> > > problem
> > > >> (as
> > > >> >> > long as we do follow).
> > > >> >> >
> > > >> >> > See for example this PR for Google Provider that Max opened
> > today:
> > > >> >> > https://github.com/apache/airflow/pull/43953.= ~ -8000 lines
> of
> > > >> code we
> > > >> >> > will not have to maintain. and whoever followed it, they had 5
> > > months
> > > >> >> or so
> > > >> >> > to adjust as Google team set some rules for that and in most
> > cases
> > > >> set
> > > >> >> > the deprecation date. This is cool. Most of those warnings
> have a
> > > >> >> > "planned_removal_date" set.
> > > >> >> >
> > > >> >> > And I think it also shows our maturity - that we do not "jump"
> on
> > > >> >> changing
> > > >> >> > something, but we plan it, we have a reason behind it, whe know
> > why
> > > >> we
> > > >> >> are
> > > >> >> > doing it. what are consequences to our users, and follow
> through,
> > > >> giving
> > > >> >> > enough warning to the users (those who care to listen).
> > > >> >> >
> > > >> >> > And the more we do it, the more we plan and announce in advance
> > and
> > > >> >> > relentlessly follow it through, the more our users will listen
> to
> > > >> those
> > > >> >> > things.
> > > >> >> >
> > > >> >> > J.
> > > >> >> >
> > > >> >> >
> > > >> >> > On Thu, Nov 14, 2024 at 4:12 PM Daniel Standish
> > > >> >> > <[email protected]> wrote:
> > > >> >> >
> > > >> >> > > To me, I don't really mind reschedule mode so much.  But I
> > > *really
> > > >> >> *don't
> > > >> >> > > like that we have *both* styles implemented in operators
> > (talking
> > > >> >> about
> > > >> >> > > providers here).
> > > >> >> > >
> > > >> >> > > I think that if we have implemented async for an operator, we
> > > >> should
> > > >> >> > remove
> > > >> >> > > the non-async version.
> > > >> >> > >
> > > >> >> > > And we should remove the `deferrable` param for 3.0 -- this
> is
> > > >> >> something
> > > >> >> > > that to me makes no sense, since it implies that sensors
> should
> > > >> have
> > > >> >> both
> > > >> >> > > ways implemented.
> > > >> >> > >
> > > >> >> > > But users should still be allowed to write rescheduling
> > sensors.
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSSION] Replace Poke & Reschedule mode from Sensors for Airflow 3 in favor of Deferrable

Reply via email to