> Personally, I’m a bit skeptical that we will come up with a metric based 
> heuristic that works well in most scenarios and doesn’t require significant 
> knowledge and tuning. I think past implementations of the dynamic snitch are 
> good evidence of that.

I am more optimistic on that font. I think we can achieve a lot. However, in my 
opinion, we need to focus on balancing the load rather than rate limiting. Rate 
limiting is going to be important if/when we decide to implement workload 
isolation. Until then, I think we should focus on three things:

  * Node health (Nodes should produce useful work and should be stable and not 
overloaded)
  * Latency (we always need to find an optimal way to process request and 
minimize overall queueing time)
  * Fairness (avoid workload and utilization imbalances)

All three points are achievable with very straightforward approaches that will 
not require much operator involvement.

I guess my main point is we need to solve load balancing (summarized by the 
above three points) before we start working on the rate limiter, but there's a 
good chance we may not need one apart from use cases that require workload 
isolation. 


On Fri, Sep 20, 2024, at 8:14 PM, Jordan West wrote:
> +1 to Benedict’s (and others) comments on plugability and low overhead when 
> disabled. The latter I think needs little justification. The reason I am big 
> on the former is, in my opinion: decisions on approach need to be settled 
> with numbers not anecdotes or past experience (including my own). So I would 
> like to see us compare different approaches (what metrics to use, etc). 
> 
> Personally, I’m a bit skeptical that we will come up with a metric based 
> heuristic that works well in most scenarios and doesn’t require significant 
> knowledge and tuning. I think past implementations of the dynamic snitch are 
> good evidence of that. However, I expressed the same concerns internally for 
> a client level project where we exposed metrics to induce back pressure and 
> early experiments are encouraging / contrary to my expectations. At different 
> layers different approaches can work better or worse. Same with different 
> workloads. I don’t think we should dismiss approaches out right in this 
> thread without hard numbers. 
> 
> In short, I think the testing and evaluation of this CEP is as important as 
> its design and implementation. We will need to test a wide variety of 
> workloads and potentially implementations and that’s where pluggability will 
> be a huge benefit. I would go as far as saying the CEP should focus more on a 
> framework for pluggable implementations that has low to zero cost when 
> disabled than a specific set of metrics to use or specific approach. 
> 
> Jordan 
> 
> On Thu, Sep 19, 2024 at 14:38 Benedict Elliott Smith <bened...@apache.org> 
> wrote:
>> I just want to flag here that this is a topic I have strong opinions on, but 
>> the CEP is not really specific or detailed enough to understand precisely 
>> how it will be implemented. So, if a patch is already being produced, most 
>> of my feedback is likely to be provided some time after a patch appears, 
>> through the normal review process. I want to flag this now to avoid any 
>> surprise.
>> 
>> I will say that upfront that, ideally, this system should be designed to 
>> have ~zero overhead when disabled, and with minimal coupling (between its 
>> own components and C* itself), so that entirely orthogonal approaches can be 
>> integrated in future without polluting the codebase.
>> 
>> 
>>> On 19 Sep 2024, at 19:14, Patrick McFadin <pmcfa...@gmail.com> wrote:
>>> 
>>> The work has begun but we don't have a VOTE thread for this CEP. Can one 
>>> get started?
>>> 
>>> On Mon, May 6, 2024 at 9:24 PM Jaydeep Chovatia 
>>> <chovatia.jayd...@gmail.com> wrote:
>>>> Sure, Caleb. I will include the work as part of CASSANDRA-19534 
>>>> <https://issues.apache.org/jira/browse/CASSANDRA-19534> in the CEP-41.
>>>> 
>>>> Jaydeep
>>>> 
>>>> On Fri, May 3, 2024 at 7:48 AM Caleb Rackliffe <calebrackli...@gmail.com> 
>>>> wrote:
>>>>> FYI, there is some ongoing sort-of-related work going on in 
>>>>> CASSANDRA-19534 <https://issues.apache.org/jira/browse/CASSANDRA-19534>
>>>>> 
>>>>> On Wed, Apr 10, 2024 at 6:35 PM Jaydeep Chovatia 
>>>>> <chovatia.jayd...@gmail.com> wrote:
>>>>>> Just created an official CEP-41 
>>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-41+%28DRAFT%29+Apache+Cassandra+Unified+Rate+Limiter>
>>>>>>  incorporating the feedback from this discussion. Feel free to let me 
>>>>>> know if I may have missed some important feedback in this thread that is 
>>>>>> not captured in the CEP-41.
>>>>>> 
>>>>>> Jaydeep
>>>>>> 
>>>>>> On Thu, Feb 22, 2024 at 11:36 AM Jaydeep Chovatia 
>>>>>> <chovatia.jayd...@gmail.com> wrote:
>>>>>>> Thanks, Josh. I will file an official CEP with all the details in a few 
>>>>>>> days and update this thread with that CEP number.
>>>>>>> Thanks a lot everyone for providing valuable insights!
>>>>>>> 
>>>>>>> Jaydeep
>>>>>>> 
>>>>>>> On Thu, Feb 22, 2024 at 9:24 AM Josh McKenzie <jmcken...@apache.org> 
>>>>>>> wrote:
>>>>>>>> __
>>>>>>>>> Do folks think we should file an official CEP and take it there?
>>>>>>>> +1 here.
>>>>>>>> 
>>>>>>>> Synthesizing your gdoc, Caleb's work, and the feedback from this 
>>>>>>>> thread into a draft seems like a solid next step.
>>>>>>>> 
>>>>>>>> On Wed, Feb 7, 2024, at 12:31 PM, Jaydeep Chovatia wrote:
>>>>>>>>> I see a lot of great ideas being discussed or proposed in the past to 
>>>>>>>>> cover the most common rate limiter candidate use cases. Do folks 
>>>>>>>>> think we should file an official CEP and take it there?
>>>>>>>>> 
>>>>>>>>> Jaydeep
>>>>>>>>> 
>>>>>>>>> On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe 
>>>>>>>>> <calebrackli...@gmail.com> wrote:
>>>>>>>>>> I just remembered the other day that I had done a quick writeup on 
>>>>>>>>>> the state of compaction stress-related throttling in the project:
>>>>>>>>>> 
>>>>>>>>>> https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing
>>>>>>>>>> 
>>>>>>>>>> I'm sure most of it is old news to the people on this thread, but I 
>>>>>>>>>> figured I'd post it just in case :)
>>>>>>>>>> 
>>>>>>>>>> On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie 
>>>>>>>>>> <jmcken...@apache.org> wrote:
>>>>>>>>>>> __
>>>>>>>>>>>> 2.) We should make sure the links between the "known" root causes 
>>>>>>>>>>>> of cascading failures and the mechanisms we introduce to avoid 
>>>>>>>>>>>> them remain very strong.
>>>>>>>>>>> Seems to me that our historical strategy was to address individual 
>>>>>>>>>>> known cases one-by-one rather than looking for a more holistic 
>>>>>>>>>>> load-balancing and load-shedding solution. While the engineer in me 
>>>>>>>>>>> likes the elegance of a broad, more-inclusive *actual SEDA-like* 
>>>>>>>>>>> approach, the pragmatist in me wonders how far we think we are 
>>>>>>>>>>> today from a stable set-point.
>>>>>>>>>>> 
>>>>>>>>>>> i.e. are we facing a handful of cases where nodes can still get 
>>>>>>>>>>> pushed over and then cascade that we can surgically address, or are 
>>>>>>>>>>> we facing a broader lack of back-pressure that rears its head in 
>>>>>>>>>>> different domains (client -> coordinator, coordinator -> replica, 
>>>>>>>>>>> internode with other operations, etc) at surprising times and 
>>>>>>>>>>> should be considered more holistically?
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:
>>>>>>>>>>>> I almost forgot CASSANDRA-15817, which introduced 
>>>>>>>>>>>> reject_repair_compaction_threshold, which provides a mechanism to 
>>>>>>>>>>>> stop repairs while compaction is underwater.
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe 
>>>>>>>>>>>>> <calebrackli...@gmail.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hey all,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm a bit late to the discussion. I see that we've already 
>>>>>>>>>>>>> discussed CASSANDRA-15013 
>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-15013> and 
>>>>>>>>>>>>> CASSANDRA-16663 
>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-16663> at least 
>>>>>>>>>>>>> in passing. Having written the latter, I'd be the first to admit 
>>>>>>>>>>>>> it's a crude tool, although it's been useful here and there, and 
>>>>>>>>>>>>> provides a couple primitives that may be useful for future work. 
>>>>>>>>>>>>> As Scott mentions, while it is configurable at runtime, it is not 
>>>>>>>>>>>>> adaptive, although we did make configuration easier in 
>>>>>>>>>>>>> CASSANDRA-17423 
>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17423>. It also 
>>>>>>>>>>>>> is global to the node, although we've lightly discussed some 
>>>>>>>>>>>>> ideas around making it more granular. (For example, 
>>>>>>>>>>>>> keyspace-based limiting, or limiting "domains" tagged by the 
>>>>>>>>>>>>> client in requests, could be interesting.) It also does not deal 
>>>>>>>>>>>>> with inter-node traffic, of course.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Something we've not yet mentioned (that does address internode 
>>>>>>>>>>>>> traffic) is CASSANDRA-17324 
>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17324>, which I 
>>>>>>>>>>>>> proposed shortly after working on the native request limiter (and 
>>>>>>>>>>>>> have just not had much time to return to). The basic idea is this:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> When a node is struggling under the weight of a compaction 
>>>>>>>>>>>>>> backlog and becomes a cause of increased read latency for 
>>>>>>>>>>>>>> clients, we have two safety valves:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 1.) Disabling the native protocol server, which stops the node 
>>>>>>>>>>>>>> from coordinating reads and writes.
>>>>>>>>>>>>>> 2.) Jacking up the severity on the node, which tells the dynamic 
>>>>>>>>>>>>>> snitch to avoid the node for reads from other coordinators.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> These are useful, but we don’t appear to have any mechanism that 
>>>>>>>>>>>>>> would allow us to temporarily reject internode hint, batch, and 
>>>>>>>>>>>>>> mutation messages that could further delay resolution of the 
>>>>>>>>>>>>>> compaction backlog.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Whether it's done as part of a larger framework or on its own, it 
>>>>>>>>>>>>> still feels like a good idea.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thinking in terms of opportunity costs here (i.e. where we spend 
>>>>>>>>>>>>> our finite engineering time to holistically improve the 
>>>>>>>>>>>>> experience of operating this database) is healthy, but we 
>>>>>>>>>>>>> probably haven't reached the point of diminishing returns on 
>>>>>>>>>>>>> nodes being able to protect themselves from clients and from 
>>>>>>>>>>>>> other nodes. I would just keep in mind two things:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1.) The effectiveness of rate-limiting in the system (which 
>>>>>>>>>>>>> includes the database and all clients) as a whole necessarily 
>>>>>>>>>>>>> decreases as we move from the application to the lowest-level 
>>>>>>>>>>>>> database internals. Limiting correctly at the client will save 
>>>>>>>>>>>>> more resources than limiting at the native protocol server, and 
>>>>>>>>>>>>> limiting correctly at the native protocol server will save more 
>>>>>>>>>>>>> resources than limiting after we've dispatched requests to some 
>>>>>>>>>>>>> thread pool for processing.
>>>>>>>>>>>>> 2.) We should make sure the links between the "known" root causes 
>>>>>>>>>>>>> of cascading failures and the mechanisms we introduce to avoid 
>>>>>>>>>>>>> them remain very strong.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In any case, I'd be happy to help out in any way I can as this 
>>>>>>>>>>>>> moves forward (especially as it relates to our past/current 
>>>>>>>>>>>>> attempts to address this problem space).
>>>>>>>>>>> 
>>>>>>>> 

Reply via email to