Thanks, Josh. I will file an official CEP with all the details in a few days and update this thread with that CEP number. Thanks a lot everyone for providing valuable insights!
Jaydeep On Thu, Feb 22, 2024 at 9:24 AM Josh McKenzie <jmcken...@apache.org> wrote: > Do folks think we should file an official CEP and take it there? > > +1 here. > > Synthesizing your gdoc, Caleb's work, and the feedback from this thread > into a draft seems like a solid next step. > > On Wed, Feb 7, 2024, at 12:31 PM, Jaydeep Chovatia wrote: > > I see a lot of great ideas being discussed or proposed in the past to > cover the most common rate limiter candidate use cases. Do folks think we > should file an official CEP and take it there? > > Jaydeep > > On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe <calebrackli...@gmail.com> > wrote: > > I just remembered the other day that I had done a quick writeup on the > state of compaction stress-related throttling in the project: > > > https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing > > I'm sure most of it is old news to the people on this thread, but I > figured I'd post it just in case :) > > On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie <jmcken...@apache.org> > wrote: > > > 2.) We should make sure the links between the "known" root causes of > cascading failures and the mechanisms we introduce to avoid them remain > very strong. > > Seems to me that our historical strategy was to address individual known > cases one-by-one rather than looking for a more holistic load-balancing and > load-shedding solution. While the engineer in me likes the elegance of a > broad, more-inclusive *actual SEDA-like* approach, the pragmatist in me > wonders how far we think we are today from a stable set-point. > > i.e. are we facing a handful of cases where nodes can still get pushed > over and then cascade that we can surgically address, or are we facing a > broader lack of back-pressure that rears its head in different domains > (client -> coordinator, coordinator -> replica, internode with other > operations, etc) at surprising times and should be considered more > holistically? > > On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote: > > I almost forgot CASSANDRA-15817, which introduced > reject_repair_compaction_threshold, which provides a mechanism to stop > repairs while compaction is underwater. > > On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe <calebrackli...@gmail.com> > wrote: > > > Hey all, > > I'm a bit late to the discussion. I see that we've already discussed > CASSANDRA-15013 <https://issues.apache.org/jira/browse/CASSANDRA-15013> > and CASSANDRA-16663 > <https://issues.apache.org/jira/browse/CASSANDRA-16663> at least in > passing. Having written the latter, I'd be the first to admit it's a crude > tool, although it's been useful here and there, and provides a couple > primitives that may be useful for future work. As Scott mentions, while it > is configurable at runtime, it is not adaptive, although we did > make configuration easier in CASSANDRA-17423 > <https://issues.apache.org/jira/browse/CASSANDRA-17423>. It also is > global to the node, although we've lightly discussed some ideas around > making it more granular. (For example, keyspace-based limiting, or limiting > "domains" tagged by the client in requests, could be interesting.) It also > does not deal with inter-node traffic, of course. > > Something we've not yet mentioned (that does address internode traffic) is > CASSANDRA-17324 <https://issues.apache.org/jira/browse/CASSANDRA-17324>, > which I proposed shortly after working on the native request limiter (and > have just not had much time to return to). The basic idea is this: > > When a node is struggling under the weight of a compaction backlog and > becomes a cause of increased read latency for clients, we have two safety > valves: > > > 1.) Disabling the native protocol server, which stops the node from > coordinating reads and writes. > 2.) Jacking up the severity on the node, which tells the dynamic snitch to > avoid the node for reads from other coordinators. > > > These are useful, but we don’t appear to have any mechanism that would > allow us to temporarily reject internode hint, batch, and mutation messages > that could further delay resolution of the compaction backlog. > > > Whether it's done as part of a larger framework or on its own, it still > feels like a good idea. > > Thinking in terms of opportunity costs here (i.e. where we spend our > finite engineering time to holistically improve the experience of operating > this database) is healthy, but we probably haven't reached the point of > diminishing returns on nodes being able to protect themselves from clients > and from other nodes. I would just keep in mind two things: > > 1.) The effectiveness of rate-limiting in the system (which includes the > database and all clients) as a whole necessarily decreases as we move from > the application to the lowest-level database internals. Limiting correctly > at the client will save more resources than limiting at the native protocol > server, and limiting correctly at the native protocol server will save more > resources than limiting after we've dispatched requests to some thread pool > for processing. > 2.) We should make sure the links between the "known" root causes of > cascading failures and the mechanisms we introduce to avoid them remain > very strong. > > In any case, I'd be happy to help out in any way I can as this moves > forward (especially as it relates to our past/current attempts to address > this problem space). > > > >