[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375853#comment-15375853
 ] 

Jonathan Ellis commented on CASSANDRA-9318:
-------------------------------------------

bq. I can't see any better options than what we implement in this patch for 
those use cases willing to trade performance for overall stability

I feel like we're going in circles here.  Here is a better option:

Pick a number for how much memory we can afford to have taken up by in flight 
requests (remembering that we need to keep the entirely payload around for 
potential hint writing) as a fraction of the heap, the way we do with memtables 
or key cache.  If we hit that mark we start throttling new requests and only 
accept them as old ones drain off.

This has the following benefits:

# Strictly better than the status quo.  I.e., does not make things worse where 
the current behavior is fine (single replica misbehaves, we write hints but 
don't slow things down), and makes things better where the current behavior is 
not (throttle instead of falling over OOM).
# No client side logic is required, all the client sees is slower request 
acceptance when throttling kicks in.
# Gives us a metric we can expose to clients to improve load balancing.
# Does not require a lot of tuning.  (If the system is overloaded it will 
eventually reach even a relatively high mark.  If it doesn't, well, you're not 
going to OOM so you don't need to throttle.)


> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths, Streaming and Messaging
>            Reporter: Ariel Weisberg
>            Assignee: Sergio Bossa
>         Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to