Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

Jon Haddad Sat, 21 Sep 2024 13:09:29 -0700

Oh, one last thing.  If the client drivers were to implement a rate limiter
based on each node's error rate, and had the ability to back off, paired
with CASSANDRA-19534 <https://issues.apache.org/jira/browse/CASSANDRA-19534>,
I think the majority of severe cluster outages that people experience would
simply disappear, making fast rejection of requests at the server side more
of an optimization than a necessity.


There was a nice Netflix post about it here [1], and was the inspiration
behind my latency target mode that's in easy-cass-stress.

https://netflixtechblog.medium.com/performance-under-load-3e6fa9a60581

Jon

On Sat, Sep 21, 2024 at 12:38 PM Jon Haddad <j...@rustyrazorblade.com> wrote:

> Can you elaborate what “the bad” is here? Maybe a scenario would help. I’m
> trying to visualize what kind of workload would be running where you
> wouldn’t have timeouts or a deep queue yet a node is overloaded.  What is
> “the bad” if requests aren’t timing out? How is a node overloaded if there
> isn’t work queued up?
>
> The levers we have under our control are concurrent work and the queue of
> work. Applying a rate limiter to the queue is a roundabout way of limiting
> the queue depth, so i view it as unnecessary in a world where the queue has
> a limit.
>
> Dynamically sizing the depth of the queue based on timeouts allows us to
> adjust it based on the heuristic of the amount of work that was able to be
> completed successfully in the past.  If you know that the queue has seen
> timeouts when it gets deeper than 100k requests, you stop accepting
> requests after 100k is reached.  The rate of rejection will be roughly
> correlated to the number of requests that would otherwise timeout.
> Basically we say, we can’t serve you, so don’t even get in line, and we use
> the length of the line as our throttle.
>
> Fwiw, I fully acknowledge that I didn’t invent any of this, it’s how
> hardware works already. I’m just applying the same concepts to our software
> queues that cpu, disks, and network cards already successfully use.
>
> Jon
>
> On Sat, Sep 21, 2024 at 12:05 PM Jordan West <jorda...@gmail.com> wrote:
>
>> I agree with Josh. We need to be able to protect from a sudden burst of
>> traffic. 19534 went a long way in that regard — at least wrt to minimizing
>> the effects. The challenge with latency and queue depths can be that they
>> trigger when the bad has already occurred.
>>
>> One other thing we are considering internally in that project I mentioned
>> is having separate thresholds. One that starts to slow traffic and one that
>> rejects it.
>>
>> All this said, I think our intuition can guide us to initial
>> implementations but we I’m still pretty adamant that our decisions should
>> be guided by testing and numbers for this CEP.
>>
>> Jordan
>>
>> On Sat, Sep 21, 2024 at 04:36 Josh McKenzie <jmcken...@apache.org> wrote:
>>
>>> Are those three sufficient to protect against a client that unexpectedly
>>> comes up with 100x a previous provisioned-for workload? Or 100 clients at
>>> 100x concurrently? Given that can be 100x in terms of quantity (helped by
>>> queueing and shedding), but also 100x in terms of *computational and
>>> disk implications*? We don't have control over what users do, and while
>>> in an ideal world client applications have fairly predictable workloads
>>> over time, in practice things spike around various events for different
>>> applications.
>>>
>>> i.e. 1 thing in a queue could produce orders of magnitude more work than
>>> 1 other thing in a queue. Especially as we move into a world with SAI and
>>> Accord.
>>>
>>> we need to solve load balancing (summarized by the above three points)
>>> before we start working on the rate limiter
>>>
>>> I see these 2 things as complementary, not as interdependent. Is there
>>> something I'm missing?
>>>
>>> At least for me, mentally I frame this as "How can nodes protect
>>> themselves from variable and changing user behavior in a way that's
>>> minimally disruptive to the user and requires as little configuration as
>>> possible for operators?". Basically, how do we keep the limits of node
>>> performance from leaking into the scope of user awareness and
>>> responsibility outside simply pushing various exceptions to the client to
>>> indicate what's going on (OverloadedException to the client, etc).
>>>
>>> It seems to me both rate limiting and resource balancing are integral
>>> parts of this, but also parts that could be worked on independently. Were
>>> we to wave a magic wand tomorrow and have robust rate limiting, as we
>>> improved load balancing over time it would raise the ceiling at which rate
>>> limiting kicked in.
>>>
>>> So concretely to the thread, I think I agree with Jon:
>>>
>>> * use the rate of timeouts to limit the depth of the queues for each of
>>> the thread pools
>>> * reject requests when the queue is full with an OverloadedException.
>>>
>>> followed by:
>>>
>>> If you want to follow this up with the ability to dynamically resize
>>> thread pools that could be interesting.
>>>
>>>
>>> Simple is very much a feature here.
>>>
>>> On Sat, Sep 21, 2024, at 5:20 AM, Alex Petrov wrote:
>>>
>>> > Personally, I’m a bit skeptical that we will come up with a metric
>>> based heuristic that works well in most scenarios and doesn’t require
>>> significant knowledge and tuning. I think past implementations of the
>>> dynamic snitch are good evidence of that.
>>>
>>> I am more optimistic on that font. I think we can achieve a lot.
>>> However, in my opinion, we need to focus on balancing the load rather than
>>> rate limiting. Rate limiting is going to be important if/when we decide to
>>> implement workload isolation. Until then, I think we should focus on three
>>> things:
>>>
>>>   * Node health (Nodes should produce useful work and should be stable
>>> and not overloaded)
>>>   * Latency (we always need to find an optimal way to process request
>>> and minimize overall queueing time)
>>>   * Fairness (avoid workload and utilization imbalances)
>>>
>>> All three points are achievable with very straightforward approaches
>>> that will not require much operator involvement.
>>>
>>> I guess my main point is we need to solve load balancing (summarized by
>>> the above three points) before we start working on the rate limiter, but
>>> there's a good chance we may not need one apart from use cases that require
>>> workload isolation.
>>>
>>>
>>> On Fri, Sep 20, 2024, at 8:14 PM, Jordan West wrote:
>>>
>>> +1 to Benedict’s (and others) comments on plugability and low overhead
>>> when disabled. The latter I think needs little justification. The reason I
>>> am big on the former is, in my opinion: decisions on approach need to be
>>> settled with numbers not anecdotes or past experience (including my own).
>>> So I would like to see us compare different approaches (what metrics to
>>> use, etc).
>>>
>>> Personally, I’m a bit skeptical that we will come up with a metric based
>>> heuristic that works well in most scenarios and doesn’t require significant
>>> knowledge and tuning. I think past implementations of the dynamic snitch
>>> are good evidence of that. However, I expressed the same concerns
>>> internally for a client level project where we exposed metrics to induce
>>> back pressure and early experiments are encouraging / contrary to my
>>> expectations. At different layers different approaches can work better or
>>> worse. Same with different workloads. I don’t think we should dismiss
>>> approaches out right in this thread without hard numbers.
>>>
>>> In short, I think the testing and evaluation of this CEP is as important
>>> as its design and implementation. We will need to test a wide variety of
>>> workloads and potentially implementations and that’s where pluggability
>>> will be a huge benefit. I would go as far as saying the CEP should focus
>>> more on a framework for pluggable implementations that has low to zero cost
>>> when disabled than a specific set of metrics to use or specific approach.
>>>
>>> Jordan
>>>
>>> On Thu, Sep 19, 2024 at 14:38 Benedict Elliott Smith <
>>> bened...@apache.org> wrote:
>>>
>>> I just want to flag here that this is a topic I have strong opinions on,
>>> but the CEP is not really specific or detailed enough to understand
>>> precisely how it will be implemented. So, if a patch is already being
>>> produced, most of my feedback is likely to be provided some time after a
>>> patch appears, through the normal review process. I want to flag this now
>>> to avoid any surprise.
>>>
>>> I will say that upfront that, ideally, this system should be designed to
>>> have ~zero overhead when disabled, and with minimal coupling (between its
>>> own components and C* itself), so that entirely orthogonal approaches can
>>> be integrated in future without polluting the codebase.
>>>
>>>
>>> On 19 Sep 2024, at 19:14, Patrick McFadin <pmcfa...@gmail.com> wrote:
>>>
>>> The work has begun but we don't have a VOTE thread for this CEP. Can one
>>> get started?
>>>
>>> On Mon, May 6, 2024 at 9:24 PM Jaydeep Chovatia <
>>> chovatia.jayd...@gmail.com> wrote:
>>>
>>> Sure, Caleb. I will include the work as part of CASSANDRA-19534
>>> <https://issues.apache.org/jira/browse/CASSANDRA-19534> in the CEP-41.
>>>
>>> Jaydeep
>>>
>>> On Fri, May 3, 2024 at 7:48 AM Caleb Rackliffe <calebrackli...@gmail.com>
>>> wrote:
>>>
>>> FYI, there is some ongoing sort-of-related work going on in
>>> CASSANDRA-19534 <https://issues.apache.org/jira/browse/CASSANDRA-19534>
>>>
>>> On Wed, Apr 10, 2024 at 6:35 PM Jaydeep Chovatia <
>>> chovatia.jayd...@gmail.com> wrote:
>>>
>>> Just created an official CEP-41
>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-41+%28DRAFT%29+Apache+Cassandra+Unified+Rate+Limiter>
>>> incorporating the feedback from this discussion. Feel free to let me know
>>> if I may have missed some important feedback in this thread that is not
>>> captured in the CEP-41.
>>>
>>> Jaydeep
>>>
>>> On Thu, Feb 22, 2024 at 11:36 AM Jaydeep Chovatia <
>>> chovatia.jayd...@gmail.com> wrote:
>>>
>>> Thanks, Josh. I will file an official CEP with all the details in a few
>>> days and update this thread with that CEP number.
>>> Thanks a lot everyone for providing valuable insights!
>>>
>>> Jaydeep
>>>
>>> On Thu, Feb 22, 2024 at 9:24 AM Josh McKenzie <jmcken...@apache.org>
>>> wrote:
>>>
>>>
>>> Do folks think we should file an official CEP and take it there?
>>>
>>> +1 here.
>>>
>>> Synthesizing your gdoc, Caleb's work, and the feedback from this thread
>>> into a draft seems like a solid next step.
>>>
>>> On Wed, Feb 7, 2024, at 12:31 PM, Jaydeep Chovatia wrote:
>>>
>>> I see a lot of great ideas being discussed or proposed in the past to
>>> cover the most common rate limiter candidate use cases. Do folks think we
>>> should file an official CEP and take it there?
>>>
>>> Jaydeep
>>>
>>> On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe <calebrackli...@gmail.com>
>>> wrote:
>>>
>>> I just remembered the other day that I had done a quick writeup on the
>>> state of compaction stress-related throttling in the project:
>>>
>>>
>>> https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing
>>>
>>> I'm sure most of it is old news to the people on this thread, but I
>>> figured I'd post it just in case :)
>>>
>>> On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie <jmcken...@apache.org>
>>> wrote:
>>>
>>>
>>> 2.) We should make sure the links between the "known" root causes of
>>> cascading failures and the mechanisms we introduce to avoid them remain
>>> very strong.
>>>
>>> Seems to me that our historical strategy was to address individual known
>>> cases one-by-one rather than looking for a more holistic load-balancing and
>>> load-shedding solution. While the engineer in me likes the elegance of a
>>> broad, more-inclusive *actual SEDA-like* approach, the pragmatist in me
>>> wonders how far we think we are today from a stable set-point.
>>>
>>> i.e. are we facing a handful of cases where nodes can still get pushed
>>> over and then cascade that we can surgically address, or are we facing a
>>> broader lack of back-pressure that rears its head in different domains
>>> (client -> coordinator, coordinator -> replica, internode with other
>>> operations, etc) at surprising times and should be considered more
>>> holistically?
>>>
>>> On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:
>>>
>>> I almost forgot CASSANDRA-15817, which introduced
>>> reject_repair_compaction_threshold, which provides a mechanism to stop
>>> repairs while compaction is underwater.
>>>
>>> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe <calebrackli...@gmail.com>
>>> wrote:
>>>
>>> 
>>> Hey all,
>>>
>>> I'm a bit late to the discussion. I see that we've already discussed
>>> CASSANDRA-15013 <https://issues.apache.org/jira/browse/CASSANDRA-15013>
>>>  and CASSANDRA-16663
>>> <https://issues.apache.org/jira/browse/CASSANDRA-16663> at least in
>>> passing. Having written the latter, I'd be the first to admit it's a crude
>>> tool, although it's been useful here and there, and provides a couple
>>> primitives that may be useful for future work. As Scott mentions, while it
>>> is configurable at runtime, it is not adaptive, although we did
>>> make configuration easier in CASSANDRA-17423
>>> <https://issues.apache.org/jira/browse/CASSANDRA-17423>. It also is
>>> global to the node, although we've lightly discussed some ideas around
>>> making it more granular. (For example, keyspace-based limiting, or limiting
>>> "domains" tagged by the client in requests, could be interesting.) It also
>>> does not deal with inter-node traffic, of course.
>>>
>>> Something we've not yet mentioned (that does address internode traffic)
>>> is CASSANDRA-17324
>>> <https://issues.apache.org/jira/browse/CASSANDRA-17324>, which I
>>> proposed shortly after working on the native request limiter (and have just
>>> not had much time to return to). The basic idea is this:
>>>
>>> When a node is struggling under the weight of a compaction backlog and
>>> becomes a cause of increased read latency for clients, we have two safety
>>> valves:
>>>
>>>
>>> 1.) Disabling the native protocol server, which stops the node from
>>> coordinating reads and writes.
>>> 2.) Jacking up the severity on the node, which tells the dynamic snitch
>>> to avoid the node for reads from other coordinators.
>>>
>>>
>>> These are useful, but we don’t appear to have any mechanism that would
>>> allow us to temporarily reject internode hint, batch, and mutation messages
>>> that could further delay resolution of the compaction backlog.
>>>
>>>
>>> Whether it's done as part of a larger framework or on its own, it still
>>> feels like a good idea.
>>>
>>> Thinking in terms of opportunity costs here (i.e. where we spend our
>>> finite engineering time to holistically improve the experience of operating
>>> this database) is healthy, but we probably haven't reached the point of
>>> diminishing returns on nodes being able to protect themselves from clients
>>> and from other nodes. I would just keep in mind two things:
>>>
>>> 1.) The effectiveness of rate-limiting in the system (which includes the
>>> database and all clients) as a whole necessarily decreases as we move from
>>> the application to the lowest-level database internals. Limiting correctly
>>> at the client will save more resources than limiting at the native protocol
>>> server, and limiting correctly at the native protocol server will save more
>>> resources than limiting after we've dispatched requests to some thread pool
>>> for processing.
>>> 2.) We should make sure the links between the "known" root causes of
>>> cascading failures and the mechanisms we introduce to avoid them remain
>>> very strong.
>>>
>>> In any case, I'd be happy to help out in any way I can as this moves
>>> forward (especially as it relates to our past/current attempts to address
>>> this problem space).
>>>
>>>
>>>
>>>
>>>
>>>

Re: [EXTERNAL] [Discuss] Generic Purpose Rate Limiter in Cassandra

Reply via email to