I just want to flag here that this is a topic I have strong opinions on, but the CEP is not really specific or detailed enough to understand precisely how it will be implemented. So, if a patch is already being produced, most of my feedback is likely to be provided some time after a patch appears, through the normal review process. I want to flag this now to avoid any surprise.
I will say that upfront that, ideally, this system should be designed to have ~zero overhead when disabled, and with minimal coupling (between its own components and C* itself), so that entirely orthogonal approaches can be integrated in future without polluting the codebase. > On 19 Sep 2024, at 19:14, Patrick McFadin <pmcfa...@gmail.com> wrote: > > The work has begun but we don't have a VOTE thread for this CEP. Can one get > started? > > On Mon, May 6, 2024 at 9:24 PM Jaydeep Chovatia <chovatia.jayd...@gmail.com > <mailto:chovatia.jayd...@gmail.com>> wrote: >> Sure, Caleb. I will include the work as part of CASSANDRA-19534 >> <https://issues.apache.org/jira/browse/CASSANDRA-19534> in the CEP-41. >> >> Jaydeep >> >> On Fri, May 3, 2024 at 7:48 AM Caleb Rackliffe <calebrackli...@gmail.com >> <mailto:calebrackli...@gmail.com>> wrote: >>> FYI, there is some ongoing sort-of-related work going on in CASSANDRA-19534 >>> <https://issues.apache.org/jira/browse/CASSANDRA-19534> >>> On Wed, Apr 10, 2024 at 6:35 PM Jaydeep Chovatia >>> <chovatia.jayd...@gmail.com <mailto:chovatia.jayd...@gmail.com>> wrote: >>>> Just created an official CEP-41 >>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-41+%28DRAFT%29+Apache+Cassandra+Unified+Rate+Limiter> >>>> incorporating the feedback from this discussion. Feel free to let me know >>>> if I may have missed some important feedback in this thread that is not >>>> captured in the CEP-41. >>>> >>>> Jaydeep >>>> >>>> On Thu, Feb 22, 2024 at 11:36 AM Jaydeep Chovatia >>>> <chovatia.jayd...@gmail.com <mailto:chovatia.jayd...@gmail.com>> wrote: >>>>> Thanks, Josh. I will file an official CEP with all the details in a few >>>>> days and update this thread with that CEP number. >>>>> Thanks a lot everyone for providing valuable insights! >>>>> >>>>> Jaydeep >>>>> >>>>> On Thu, Feb 22, 2024 at 9:24 AM Josh McKenzie <jmcken...@apache.org >>>>> <mailto:jmcken...@apache.org>> wrote: >>>>>>> Do folks think we should file an official CEP and take it there? >>>>>> +1 here. >>>>>> >>>>>> Synthesizing your gdoc, Caleb's work, and the feedback from this thread >>>>>> into a draft seems like a solid next step. >>>>>> >>>>>> On Wed, Feb 7, 2024, at 12:31 PM, Jaydeep Chovatia wrote: >>>>>>> I see a lot of great ideas being discussed or proposed in the past to >>>>>>> cover the most common rate limiter candidate use cases. Do folks think >>>>>>> we should file an official CEP and take it there? >>>>>>> >>>>>>> Jaydeep >>>>>>> >>>>>>> On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe >>>>>>> <calebrackli...@gmail.com <mailto:calebrackli...@gmail.com>> wrote: >>>>>>> I just remembered the other day that I had done a quick writeup on the >>>>>>> state of compaction stress-related throttling in the project: >>>>>>> >>>>>>> https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing >>>>>>> >>>>>>> I'm sure most of it is old news to the people on this thread, but I >>>>>>> figured I'd post it just in case :) >>>>>>> >>>>>>> On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie <jmcken...@apache.org >>>>>>> <mailto:jmcken...@apache.org>> wrote: >>>>>>> >>>>>>>> 2.) We should make sure the links between the "known" root causes of >>>>>>>> cascading failures and the mechanisms we introduce to avoid them >>>>>>>> remain very strong. >>>>>>> Seems to me that our historical strategy was to address individual >>>>>>> known cases one-by-one rather than looking for a more holistic >>>>>>> load-balancing and load-shedding solution. While the engineer in me >>>>>>> likes the elegance of a broad, more-inclusive actual SEDA-like >>>>>>> approach, the pragmatist in me wonders how far we think we are today >>>>>>> from a stable set-point. >>>>>>> >>>>>>> i.e. are we facing a handful of cases where nodes can still get pushed >>>>>>> over and then cascade that we can surgically address, or are we facing >>>>>>> a broader lack of back-pressure that rears its head in different >>>>>>> domains (client -> coordinator, coordinator -> replica, internode with >>>>>>> other operations, etc) at surprising times and should be considered >>>>>>> more holistically? >>>>>>> >>>>>>> On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote: >>>>>>>> I almost forgot CASSANDRA-15817, which introduced >>>>>>>> reject_repair_compaction_threshold, which provides a mechanism to stop >>>>>>>> repairs while compaction is underwater. >>>>>>>> >>>>>>>>> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe >>>>>>>>> <calebrackli...@gmail.com <mailto:calebrackli...@gmail.com>> wrote: >>>>>>>>> >>>>>>>>> Hey all, >>>>>>>>> >>>>>>>>> I'm a bit late to the discussion. I see that we've already discussed >>>>>>>>> CASSANDRA-15013 >>>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-15013> and >>>>>>>>> CASSANDRA-16663 >>>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-16663> at least in >>>>>>>>> passing. Having written the latter, I'd be the first to admit it's a >>>>>>>>> crude tool, although it's been useful here and there, and provides a >>>>>>>>> couple primitives that may be useful for future work. As Scott >>>>>>>>> mentions, while it is configurable at runtime, it is not adaptive, >>>>>>>>> although we did make configuration easier in CASSANDRA-17423 >>>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17423>. It also is >>>>>>>>> global to the node, although we've lightly discussed some ideas >>>>>>>>> around making it more granular. (For example, keyspace-based >>>>>>>>> limiting, or limiting "domains" tagged by the client in requests, >>>>>>>>> could be interesting.) It also does not deal with inter-node traffic, >>>>>>>>> of course. >>>>>>>>> >>>>>>>>> Something we've not yet mentioned (that does address internode >>>>>>>>> traffic) is CASSANDRA-17324 >>>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17324>, which I >>>>>>>>> proposed shortly after working on the native request limiter (and >>>>>>>>> have just not had much time to return to). The basic idea is this: >>>>>>>>> >>>>>>>>> When a node is struggling under the weight of a compaction backlog >>>>>>>>> and becomes a cause of increased read latency for clients, we have >>>>>>>>> two safety valves: >>>>>>>>> >>>>>>>>> 1.) Disabling the native protocol server, which stops the node from >>>>>>>>> coordinating reads and writes. >>>>>>>>> 2.) Jacking up the severity on the node, which tells the dynamic >>>>>>>>> snitch to avoid the node for reads from other coordinators. >>>>>>>>> >>>>>>>>> >>>>>>>>> These are useful, but we don’t appear to have any mechanism that >>>>>>>>> would allow us to temporarily reject internode hint, batch, and >>>>>>>>> mutation messages that could further delay resolution of the >>>>>>>>> compaction backlog. >>>>>>>>> >>>>>>>>> Whether it's done as part of a larger framework or on its own, it >>>>>>>>> still feels like a good idea. >>>>>>>>> >>>>>>>>> Thinking in terms of opportunity costs here (i.e. where we spend our >>>>>>>>> finite engineering time to holistically improve the experience of >>>>>>>>> operating this database) is healthy, but we probably haven't reached >>>>>>>>> the point of diminishing returns on nodes being able to protect >>>>>>>>> themselves from clients and from other nodes. I would just keep in >>>>>>>>> mind two things: >>>>>>>>> >>>>>>>>> 1.) The effectiveness of rate-limiting in the system (which includes >>>>>>>>> the database and all clients) as a whole necessarily decreases as we >>>>>>>>> move from the application to the lowest-level database internals. >>>>>>>>> Limiting correctly at the client will save more resources than >>>>>>>>> limiting at the native protocol server, and limiting correctly at the >>>>>>>>> native protocol server will save more resources than limiting after >>>>>>>>> we've dispatched requests to some thread pool for processing. >>>>>>>>> 2.) We should make sure the links between the "known" root causes of >>>>>>>>> cascading failures and the mechanisms we introduce to avoid them >>>>>>>>> remain very strong. >>>>>>>>> >>>>>>>>> In any case, I'd be happy to help out in any way I can as this moves >>>>>>>>> forward (especially as it relates to our past/current attempts to >>>>>>>>> address this problem space). >>>>>>> >>>>>>