> Personally, I’m a bit skeptical that we will come up with a metric based > heuristic that works well in most scenarios and doesn’t require significant > knowledge and tuning. I think past implementations of the dynamic snitch are > good evidence of that.
I am more optimistic on that font. I think we can achieve a lot. However, in my opinion, we need to focus on balancing the load rather than rate limiting. Rate limiting is going to be important if/when we decide to implement workload isolation. Until then, I think we should focus on three things: * Node health (Nodes should produce useful work and should be stable and not overloaded) * Latency (we always need to find an optimal way to process request and minimize overall queueing time) * Fairness (avoid workload and utilization imbalances) All three points are achievable with very straightforward approaches that will not require much operator involvement. I guess my main point is we need to solve load balancing (summarized by the above three points) before we start working on the rate limiter, but there's a good chance we may not need one apart from use cases that require workload isolation. On Fri, Sep 20, 2024, at 8:14 PM, Jordan West wrote: > +1 to Benedict’s (and others) comments on plugability and low overhead when > disabled. The latter I think needs little justification. The reason I am big > on the former is, in my opinion: decisions on approach need to be settled > with numbers not anecdotes or past experience (including my own). So I would > like to see us compare different approaches (what metrics to use, etc). > > Personally, I’m a bit skeptical that we will come up with a metric based > heuristic that works well in most scenarios and doesn’t require significant > knowledge and tuning. I think past implementations of the dynamic snitch are > good evidence of that. However, I expressed the same concerns internally for > a client level project where we exposed metrics to induce back pressure and > early experiments are encouraging / contrary to my expectations. At different > layers different approaches can work better or worse. Same with different > workloads. I don’t think we should dismiss approaches out right in this > thread without hard numbers. > > In short, I think the testing and evaluation of this CEP is as important as > its design and implementation. We will need to test a wide variety of > workloads and potentially implementations and that’s where pluggability will > be a huge benefit. I would go as far as saying the CEP should focus more on a > framework for pluggable implementations that has low to zero cost when > disabled than a specific set of metrics to use or specific approach. > > Jordan > > On Thu, Sep 19, 2024 at 14:38 Benedict Elliott Smith <bened...@apache.org> > wrote: >> I just want to flag here that this is a topic I have strong opinions on, but >> the CEP is not really specific or detailed enough to understand precisely >> how it will be implemented. So, if a patch is already being produced, most >> of my feedback is likely to be provided some time after a patch appears, >> through the normal review process. I want to flag this now to avoid any >> surprise. >> >> I will say that upfront that, ideally, this system should be designed to >> have ~zero overhead when disabled, and with minimal coupling (between its >> own components and C* itself), so that entirely orthogonal approaches can be >> integrated in future without polluting the codebase. >> >> >>> On 19 Sep 2024, at 19:14, Patrick McFadin <pmcfa...@gmail.com> wrote: >>> >>> The work has begun but we don't have a VOTE thread for this CEP. Can one >>> get started? >>> >>> On Mon, May 6, 2024 at 9:24 PM Jaydeep Chovatia >>> <chovatia.jayd...@gmail.com> wrote: >>>> Sure, Caleb. I will include the work as part of CASSANDRA-19534 >>>> <https://issues.apache.org/jira/browse/CASSANDRA-19534> in the CEP-41. >>>> >>>> Jaydeep >>>> >>>> On Fri, May 3, 2024 at 7:48 AM Caleb Rackliffe <calebrackli...@gmail.com> >>>> wrote: >>>>> FYI, there is some ongoing sort-of-related work going on in >>>>> CASSANDRA-19534 <https://issues.apache.org/jira/browse/CASSANDRA-19534> >>>>> >>>>> On Wed, Apr 10, 2024 at 6:35 PM Jaydeep Chovatia >>>>> <chovatia.jayd...@gmail.com> wrote: >>>>>> Just created an official CEP-41 >>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-41+%28DRAFT%29+Apache+Cassandra+Unified+Rate+Limiter> >>>>>> incorporating the feedback from this discussion. Feel free to let me >>>>>> know if I may have missed some important feedback in this thread that is >>>>>> not captured in the CEP-41. >>>>>> >>>>>> Jaydeep >>>>>> >>>>>> On Thu, Feb 22, 2024 at 11:36 AM Jaydeep Chovatia >>>>>> <chovatia.jayd...@gmail.com> wrote: >>>>>>> Thanks, Josh. I will file an official CEP with all the details in a few >>>>>>> days and update this thread with that CEP number. >>>>>>> Thanks a lot everyone for providing valuable insights! >>>>>>> >>>>>>> Jaydeep >>>>>>> >>>>>>> On Thu, Feb 22, 2024 at 9:24 AM Josh McKenzie <jmcken...@apache.org> >>>>>>> wrote: >>>>>>>> __ >>>>>>>>> Do folks think we should file an official CEP and take it there? >>>>>>>> +1 here. >>>>>>>> >>>>>>>> Synthesizing your gdoc, Caleb's work, and the feedback from this >>>>>>>> thread into a draft seems like a solid next step. >>>>>>>> >>>>>>>> On Wed, Feb 7, 2024, at 12:31 PM, Jaydeep Chovatia wrote: >>>>>>>>> I see a lot of great ideas being discussed or proposed in the past to >>>>>>>>> cover the most common rate limiter candidate use cases. Do folks >>>>>>>>> think we should file an official CEP and take it there? >>>>>>>>> >>>>>>>>> Jaydeep >>>>>>>>> >>>>>>>>> On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe >>>>>>>>> <calebrackli...@gmail.com> wrote: >>>>>>>>>> I just remembered the other day that I had done a quick writeup on >>>>>>>>>> the state of compaction stress-related throttling in the project: >>>>>>>>>> >>>>>>>>>> https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing >>>>>>>>>> >>>>>>>>>> I'm sure most of it is old news to the people on this thread, but I >>>>>>>>>> figured I'd post it just in case :) >>>>>>>>>> >>>>>>>>>> On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie >>>>>>>>>> <jmcken...@apache.org> wrote: >>>>>>>>>>> __ >>>>>>>>>>>> 2.) We should make sure the links between the "known" root causes >>>>>>>>>>>> of cascading failures and the mechanisms we introduce to avoid >>>>>>>>>>>> them remain very strong. >>>>>>>>>>> Seems to me that our historical strategy was to address individual >>>>>>>>>>> known cases one-by-one rather than looking for a more holistic >>>>>>>>>>> load-balancing and load-shedding solution. While the engineer in me >>>>>>>>>>> likes the elegance of a broad, more-inclusive *actual SEDA-like* >>>>>>>>>>> approach, the pragmatist in me wonders how far we think we are >>>>>>>>>>> today from a stable set-point. >>>>>>>>>>> >>>>>>>>>>> i.e. are we facing a handful of cases where nodes can still get >>>>>>>>>>> pushed over and then cascade that we can surgically address, or are >>>>>>>>>>> we facing a broader lack of back-pressure that rears its head in >>>>>>>>>>> different domains (client -> coordinator, coordinator -> replica, >>>>>>>>>>> internode with other operations, etc) at surprising times and >>>>>>>>>>> should be considered more holistically? >>>>>>>>>>> >>>>>>>>>>> On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote: >>>>>>>>>>>> I almost forgot CASSANDRA-15817, which introduced >>>>>>>>>>>> reject_repair_compaction_threshold, which provides a mechanism to >>>>>>>>>>>> stop repairs while compaction is underwater. >>>>>>>>>>>> >>>>>>>>>>>>> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe >>>>>>>>>>>>> <calebrackli...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hey all, >>>>>>>>>>>>> >>>>>>>>>>>>> I'm a bit late to the discussion. I see that we've already >>>>>>>>>>>>> discussed CASSANDRA-15013 >>>>>>>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-15013> and >>>>>>>>>>>>> CASSANDRA-16663 >>>>>>>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-16663> at least >>>>>>>>>>>>> in passing. Having written the latter, I'd be the first to admit >>>>>>>>>>>>> it's a crude tool, although it's been useful here and there, and >>>>>>>>>>>>> provides a couple primitives that may be useful for future work. >>>>>>>>>>>>> As Scott mentions, while it is configurable at runtime, it is not >>>>>>>>>>>>> adaptive, although we did make configuration easier in >>>>>>>>>>>>> CASSANDRA-17423 >>>>>>>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17423>. It also >>>>>>>>>>>>> is global to the node, although we've lightly discussed some >>>>>>>>>>>>> ideas around making it more granular. (For example, >>>>>>>>>>>>> keyspace-based limiting, or limiting "domains" tagged by the >>>>>>>>>>>>> client in requests, could be interesting.) It also does not deal >>>>>>>>>>>>> with inter-node traffic, of course. >>>>>>>>>>>>> >>>>>>>>>>>>> Something we've not yet mentioned (that does address internode >>>>>>>>>>>>> traffic) is CASSANDRA-17324 >>>>>>>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17324>, which I >>>>>>>>>>>>> proposed shortly after working on the native request limiter (and >>>>>>>>>>>>> have just not had much time to return to). The basic idea is this: >>>>>>>>>>>>> >>>>>>>>>>>>>> When a node is struggling under the weight of a compaction >>>>>>>>>>>>>> backlog and becomes a cause of increased read latency for >>>>>>>>>>>>>> clients, we have two safety valves: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1.) Disabling the native protocol server, which stops the node >>>>>>>>>>>>>> from coordinating reads and writes. >>>>>>>>>>>>>> 2.) Jacking up the severity on the node, which tells the dynamic >>>>>>>>>>>>>> snitch to avoid the node for reads from other coordinators. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> These are useful, but we don’t appear to have any mechanism that >>>>>>>>>>>>>> would allow us to temporarily reject internode hint, batch, and >>>>>>>>>>>>>> mutation messages that could further delay resolution of the >>>>>>>>>>>>>> compaction backlog. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Whether it's done as part of a larger framework or on its own, it >>>>>>>>>>>>> still feels like a good idea. >>>>>>>>>>>>> >>>>>>>>>>>>> Thinking in terms of opportunity costs here (i.e. where we spend >>>>>>>>>>>>> our finite engineering time to holistically improve the >>>>>>>>>>>>> experience of operating this database) is healthy, but we >>>>>>>>>>>>> probably haven't reached the point of diminishing returns on >>>>>>>>>>>>> nodes being able to protect themselves from clients and from >>>>>>>>>>>>> other nodes. I would just keep in mind two things: >>>>>>>>>>>>> >>>>>>>>>>>>> 1.) The effectiveness of rate-limiting in the system (which >>>>>>>>>>>>> includes the database and all clients) as a whole necessarily >>>>>>>>>>>>> decreases as we move from the application to the lowest-level >>>>>>>>>>>>> database internals. Limiting correctly at the client will save >>>>>>>>>>>>> more resources than limiting at the native protocol server, and >>>>>>>>>>>>> limiting correctly at the native protocol server will save more >>>>>>>>>>>>> resources than limiting after we've dispatched requests to some >>>>>>>>>>>>> thread pool for processing. >>>>>>>>>>>>> 2.) We should make sure the links between the "known" root causes >>>>>>>>>>>>> of cascading failures and the mechanisms we introduce to avoid >>>>>>>>>>>>> them remain very strong. >>>>>>>>>>>>> >>>>>>>>>>>>> In any case, I'd be happy to help out in any way I can as this >>>>>>>>>>>>> moves forward (especially as it relates to our past/current >>>>>>>>>>>>> attempts to address this problem space). >>>>>>>>>>> >>>>>>>>