TL;DR: poke, reschedule and deferrable each have their own use cases and trade offs. Please don't get rid of any of them - they provide much needed flexibility.
One other problem with deferrable is that the capacity of the triggerer is finite. AIUI, it selects N sensors to run, and runs them. The N+1'th sensor is not picked up until the triggerer is done with at least one of the original sensors. Reschedule sensors on a dedicated queue can round robin, with essentially infinite capacity. If you have a lot (like 1k+) of sensor instances which are each fast to run (e.g. S3 sensors, or a REST poll) and you don't care about individual latency so long as they each get a turn in the sun, then deferrable is not going to scale as you need to add more and more triggerers to ensure that every sensor is covered. If you have a small / fixed number of sensors, deferrable can be great. If you have a dynamic set up where the number of sensors can easily change, triggerer can cause problems / require ever more resources. There is a break even point where running a couple of workers and having a looooong tail of queued sensors uses less resources than the scheduler. As was pointed out earlier, poke is great for low latency but anathema to cloud based set ups, due to the cost of running a dedicated process. Thanks, Andrew On Mon, 25 Nov 2024, 18:46 Daniel Standish, <daniel.stand...@astronomer.io.invalid> wrote: > And, I have updated it a bit. Ready for review in case anyone wants to > have a look. Basically, after this pr, to have sensor timeout enabled, a > deferrable sensor just needs to ensure that when it defers, it passes its > timeout to the trigger. Then base sensor will detect when the trigger > ended in a timeout, and will reraise as airflow sensor timeout, and fail > immediately. To keep the PR focused, it does not change all existing > sensors; just one. But it will be pretty straightforward to do the rest of > them. > > > On Fri, Nov 22, 2024 at 2:01 PM Daniel Standish < > daniel.stand...@astronomer.io> wrote: > > > Alright, i can take a hint > > > > rebased https://github.com/apache/airflow/pull/33718 > > > > On Wed, Nov 20, 2024 at 6:23 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > >> Also another good case that just happened - > >> https://github.com/apache/airflow/discussions/44211 a user added a > very > >> valid question on overloading triggerers (i.e. trigger scalability). If > we > >> are able to answer the question of the user now - this would be great (I > >> can't). > >> > >> But I think the answer won't be pretty (i.e. we can't help with that > now) > >> - > >> so my question is how confident is that we can support such user's > >> questions and how confident current scalability promises for example can > >> be > >> met ? > >> > >> J. > >> > >> > >> On Wed, Nov 20, 2024 at 11:11 PM Jarek Potiuk <ja...@potiuk.com> wrote: > >> > >> > Actually that's an extremely good point and I agree with Elad here. > >> > > >> > If we commit to something as first priority, we should treat issues we > >> > have with it with priority. > >> > There is another - very similar - issue with on_kill handling for > >> > deferrables that I just recalled. > >> > > >> > The issue is here: https://github.com/apache/airflow/issues/36090 > that > >> > Matthew Wicks has been following on multiple times, and proposed a way > >> to > >> > approach it in general way - but the only solution we were able to > come > >> up > >> > with was to implement workaround in a few deferrable operators. > >> > > >> > I am here with Elad that until we solve this and prioritise other > issues > >> > there, making deferrable a default is a bad idea. > >> > > >> > J. > >> > > >> > > >> > > >> > > >> > On Wed, Nov 20, 2024 at 6:56 PM Elad Kalif <elad...@apache.org> > wrote: > >> > > >> >> I don't think deferrable can act as a replacement at this point. > >> >> Removing/changing defaults of poke and reschedule in favor of > >> deferrable > >> >> means that deferrable becomes a critical feature. This means that any > >> bug > >> >> related to it must be triaged and fixed ASAP. > >> >> We are not there yet. I don't think many maintainers know this area > of > >> the > >> >> code. > >> >> > >> >> In my perspective as a user I have a functionality that is working > >> fine. I > >> >> learned to trust it. I know how it behaves and am able to write code > >> >> efficiently with it. The suggested replacement is a feature that has > >> 30ish > >> >> bug reports with little community focus. That is a lot of risk with > >> small > >> >> profit to gain. For me this is an indication of -1 to this proposal. > >> >> > >> >> For example issue https://github.com/apache/airflow/issues/36734 > with > >> >> suggested PR: https://github.com/apache/airflow/pull/33718 > >> >> I admit I am not into the details but if we accept this proposal this > >> bug > >> >> becomes top priority problem and this is open since 2023. > >> >> > >> >> So my position is -1 for any change about poke and reschedule (even > for > >> >> changing defaults). > >> >> > >> >> > >> >> > >> >> On Thu, Nov 14, 2024 at 10:28 PM Jarek Potiuk <ja...@potiuk.com> > >> wrote: > >> >> > >> >> > > I think that if we have implemented async for an operator, we > >> should > >> >> > remove > >> >> > the non-async version. > >> >> > > >> >> > I think we all agree here. The question is only "when". I see two > >> >> options: > >> >> > > >> >> > 1) When Airflow 3 is released > >> >> > 2) When Airflow 2 stops being supported > >> >> > > >> >> > I'd be for 2) . > >> >> > > >> >> > Why? Because it has the least impact on our users while it does not > >> make > >> >> > our life much harder, while we might commit to clean-up eventually. > >> >> > > >> >> > I personally (with my ADHD personality twist) often have an urge to > >> fix > >> >> > things "now". It's very rewarding and has great "feedback" of > feeling > >> >> > better. But I think the next best thing is to have a clear timeline > >> to > >> >> > remove things like that and relentlessly follow it through. And it > >> has > >> >> the > >> >> > added benefits of avoiding surprises for users who follow our > >> decisions > >> >> and > >> >> > policies. > >> >> > > >> >> > I think the biggest problem I have with things that are deprecated > >> are > >> >> not > >> >> > that they are lying around, but with the feeling that they will > stay > >> >> there > >> >> > forever (aka - we will not clean our technical debt). I think > having > >> a > >> >> > policy when we remove things and following that, avoids that > problem > >> (as > >> >> > long as we do follow). > >> >> > > >> >> > See for example this PR for Google Provider that Max opened today: > >> >> > https://github.com/apache/airflow/pull/43953.= ~ -8000 lines of > >> code we > >> >> > will not have to maintain. and whoever followed it, they had 5 > months > >> >> or so > >> >> > to adjust as Google team set some rules for that and in most cases > >> set > >> >> > the deprecation date. This is cool. Most of those warnings have a > >> >> > "planned_removal_date" set. > >> >> > > >> >> > And I think it also shows our maturity - that we do not "jump" on > >> >> changing > >> >> > something, but we plan it, we have a reason behind it, whe know why > >> we > >> >> are > >> >> > doing it. what are consequences to our users, and follow through, > >> giving > >> >> > enough warning to the users (those who care to listen). > >> >> > > >> >> > And the more we do it, the more we plan and announce in advance and > >> >> > relentlessly follow it through, the more our users will listen to > >> those > >> >> > things. > >> >> > > >> >> > J. > >> >> > > >> >> > > >> >> > On Thu, Nov 14, 2024 at 4:12 PM Daniel Standish > >> >> > <daniel.stand...@astronomer.io.invalid> wrote: > >> >> > > >> >> > > To me, I don't really mind reschedule mode so much. But I > *really > >> >> *don't > >> >> > > like that we have *both* styles implemented in operators (talking > >> >> about > >> >> > > providers here). > >> >> > > > >> >> > > I think that if we have implemented async for an operator, we > >> should > >> >> > remove > >> >> > > the non-async version. > >> >> > > > >> >> > > And we should remove the `deferrable` param for 3.0 -- this is > >> >> something > >> >> > > that to me makes no sense, since it implies that sensors should > >> have > >> >> both > >> >> > > ways implemented. > >> >> > > > >> >> > > But users should still be allowed to write rescheduling sensors. > >> >> > > > >> >> > > >> >> > >> > > >> > > >