Thanks Kaxil, I agree with that conclusion. Thanks everyone for participating in the discussion.
On Tue, Jan 14, 2025 at 11:46 AM Kaxil Naik <kaxiln...@gmail.com> wrote: > Thanks everyone for the discussion. The concerns raised about deferrable > are valid and given the time and scope we already have for 3.0, let's defer > this to post 3.0. > > Let's defer the defer conversation to 3.x :) > > tldr: Things will be kept as-is! > > Regards, > Kaxil > > On Tue, 26 Nov 2024 at 07:05, Andrew Gibbs <and...@gibbs.io> wrote: > > > TL;DR: poke, reschedule and deferrable each have their own use cases and > > trade offs. Please don't get rid of any of them - they provide much > needed > > flexibility. > > > > One other problem with deferrable is that the capacity of the triggerer > is > > finite. AIUI, it selects N sensors to run, and runs them. The N+1'th > sensor > > is not picked up until the triggerer is done with at least one of the > > original sensors. > > > > Reschedule sensors on a dedicated queue can round robin, with essentially > > infinite capacity. If you have a lot (like 1k+) of sensor instances which > > are each fast to run (e.g. S3 sensors, or a REST poll) and you don't care > > about individual latency so long as they each get a turn in the sun, then > > deferrable is not going to scale as you need to add more and more > > triggerers to ensure that every sensor is covered. > > > > If you have a small / fixed number of sensors, deferrable can be great. > If > > you have a dynamic set up where the number of sensors can easily change, > > triggerer can cause problems / require ever more resources. > > > > There is a break even point where running a couple of workers and having > a > > looooong tail of queued sensors uses less resources than the scheduler. > > > > As was pointed out earlier, poke is great for low latency but anathema to > > cloud based set ups, due to the cost of running a dedicated process. > > > > Thanks, > > Andrew > > > > > > > > On Mon, 25 Nov 2024, 18:46 Daniel Standish, > > <daniel.stand...@astronomer.io.invalid> wrote: > > > > > And, I have updated it a bit. Ready for review in case anyone wants to > > > have a look. Basically, after this pr, to have sensor timeout > enabled, a > > > deferrable sensor just needs to ensure that when it defers, it passes > its > > > timeout to the trigger. Then base sensor will detect when the trigger > > > ended in a timeout, and will reraise as airflow sensor timeout, and > fail > > > immediately. To keep the PR focused, it does not change all existing > > > sensors; just one. But it will be pretty straightforward to do the > rest > > of > > > them. > > > > > > > > > On Fri, Nov 22, 2024 at 2:01 PM Daniel Standish < > > > daniel.stand...@astronomer.io> wrote: > > > > > > > Alright, i can take a hint > > > > > > > > rebased https://github.com/apache/airflow/pull/33718 > > > > > > > > On Wed, Nov 20, 2024 at 6:23 PM Jarek Potiuk <ja...@potiuk.com> > wrote: > > > > > > > >> Also another good case that just happened - > > > >> https://github.com/apache/airflow/discussions/44211 a user added a > > > very > > > >> valid question on overloading triggerers (i.e. trigger scalability). > > If > > > we > > > >> are able to answer the question of the user now - this would be > great > > (I > > > >> can't). > > > >> > > > >> But I think the answer won't be pretty (i.e. we can't help with that > > > now) > > > >> - > > > >> so my question is how confident is that we can support such user's > > > >> questions and how confident current scalability promises for example > > can > > > >> be > > > >> met ? > > > >> > > > >> J. > > > >> > > > >> > > > >> On Wed, Nov 20, 2024 at 11:11 PM Jarek Potiuk <ja...@potiuk.com> > > wrote: > > > >> > > > >> > Actually that's an extremely good point and I agree with Elad > here. > > > >> > > > > >> > If we commit to something as first priority, we should treat > issues > > we > > > >> > have with it with priority. > > > >> > There is another - very similar - issue with on_kill handling for > > > >> > deferrables that I just recalled. > > > >> > > > > >> > The issue is here: https://github.com/apache/airflow/issues/36090 > > > that > > > >> > Matthew Wicks has been following on multiple times, and proposed a > > way > > > >> to > > > >> > approach it in general way - but the only solution we were able to > > > come > > > >> up > > > >> > with was to implement workaround in a few deferrable operators. > > > >> > > > > >> > I am here with Elad that until we solve this and prioritise other > > > issues > > > >> > there, making deferrable a default is a bad idea. > > > >> > > > > >> > J. > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > On Wed, Nov 20, 2024 at 6:56 PM Elad Kalif <elad...@apache.org> > > > wrote: > > > >> > > > > >> >> I don't think deferrable can act as a replacement at this point. > > > >> >> Removing/changing defaults of poke and reschedule in favor of > > > >> deferrable > > > >> >> means that deferrable becomes a critical feature. This means that > > any > > > >> bug > > > >> >> related to it must be triaged and fixed ASAP. > > > >> >> We are not there yet. I don't think many maintainers know this > area > > > of > > > >> the > > > >> >> code. > > > >> >> > > > >> >> In my perspective as a user I have a functionality that is > working > > > >> fine. I > > > >> >> learned to trust it. I know how it behaves and am able to write > > code > > > >> >> efficiently with it. The suggested replacement is a feature that > > has > > > >> 30ish > > > >> >> bug reports with little community focus. That is a lot of risk > with > > > >> small > > > >> >> profit to gain. For me this is an indication of -1 to this > > proposal. > > > >> >> > > > >> >> For example issue https://github.com/apache/airflow/issues/36734 > > > with > > > >> >> suggested PR: https://github.com/apache/airflow/pull/33718 > > > >> >> I admit I am not into the details but if we accept this proposal > > this > > > >> bug > > > >> >> becomes top priority problem and this is open since 2023. > > > >> >> > > > >> >> So my position is -1 for any change about poke and reschedule > (even > > > for > > > >> >> changing defaults). > > > >> >> > > > >> >> > > > >> >> > > > >> >> On Thu, Nov 14, 2024 at 10:28 PM Jarek Potiuk <ja...@potiuk.com> > > > >> wrote: > > > >> >> > > > >> >> > > I think that if we have implemented async for an operator, we > > > >> should > > > >> >> > remove > > > >> >> > the non-async version. > > > >> >> > > > > >> >> > I think we all agree here. The question is only "when". I see > two > > > >> >> options: > > > >> >> > > > > >> >> > 1) When Airflow 3 is released > > > >> >> > 2) When Airflow 2 stops being supported > > > >> >> > > > > >> >> > I'd be for 2) . > > > >> >> > > > > >> >> > Why? Because it has the least impact on our users while it does > > not > > > >> make > > > >> >> > our life much harder, while we might commit to clean-up > > eventually. > > > >> >> > > > > >> >> > I personally (with my ADHD personality twist) often have an > urge > > to > > > >> fix > > > >> >> > things "now". It's very rewarding and has great "feedback" of > > > feeling > > > >> >> > better. But I think the next best thing is to have a clear > > timeline > > > >> to > > > >> >> > remove things like that and relentlessly follow it through. And > > it > > > >> has > > > >> >> the > > > >> >> > added benefits of avoiding surprises for users who follow our > > > >> decisions > > > >> >> and > > > >> >> > policies. > > > >> >> > > > > >> >> > I think the biggest problem I have with things that are > > deprecated > > > >> are > > > >> >> not > > > >> >> > that they are lying around, but with the feeling that they will > > > stay > > > >> >> there > > > >> >> > forever (aka - we will not clean our technical debt). I think > > > having > > > >> a > > > >> >> > policy when we remove things and following that, avoids that > > > problem > > > >> (as > > > >> >> > long as we do follow). > > > >> >> > > > > >> >> > See for example this PR for Google Provider that Max opened > > today: > > > >> >> > https://github.com/apache/airflow/pull/43953.= ~ -8000 lines > of > > > >> code we > > > >> >> > will not have to maintain. and whoever followed it, they had 5 > > > months > > > >> >> or so > > > >> >> > to adjust as Google team set some rules for that and in most > > cases > > > >> set > > > >> >> > the deprecation date. This is cool. Most of those warnings > have a > > > >> >> > "planned_removal_date" set. > > > >> >> > > > > >> >> > And I think it also shows our maturity - that we do not "jump" > on > > > >> >> changing > > > >> >> > something, but we plan it, we have a reason behind it, whe know > > why > > > >> we > > > >> >> are > > > >> >> > doing it. what are consequences to our users, and follow > through, > > > >> giving > > > >> >> > enough warning to the users (those who care to listen). > > > >> >> > > > > >> >> > And the more we do it, the more we plan and announce in advance > > and > > > >> >> > relentlessly follow it through, the more our users will listen > to > > > >> those > > > >> >> > things. > > > >> >> > > > > >> >> > J. > > > >> >> > > > > >> >> > > > > >> >> > On Thu, Nov 14, 2024 at 4:12 PM Daniel Standish > > > >> >> > <daniel.stand...@astronomer.io.invalid> wrote: > > > >> >> > > > > >> >> > > To me, I don't really mind reschedule mode so much. But I > > > *really > > > >> >> *don't > > > >> >> > > like that we have *both* styles implemented in operators > > (talking > > > >> >> about > > > >> >> > > providers here). > > > >> >> > > > > > >> >> > > I think that if we have implemented async for an operator, we > > > >> should > > > >> >> > remove > > > >> >> > > the non-async version. > > > >> >> > > > > > >> >> > > And we should remove the `deferrable` param for 3.0 -- this > is > > > >> >> something > > > >> >> > > that to me makes no sense, since it implies that sensors > should > > > >> have > > > >> >> both > > > >> >> > > ways implemented. > > > >> >> > > > > > >> >> > > But users should still be allowed to write rescheduling > > sensors. > > > >> >> > > > > > >> >> > > > > >> >> > > > >> > > > > >> > > > > > > > > > >