[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535369#comment-14535369
 ] 

Benedict commented on CASSANDRA-9318:
-------------------------------------

bq. because our existing load shedding is fine at recovering from temporary 
spikes in load

Are you certain? The recent testing Ariel did on CASSANDRA-8670 demonstrated 
the MUTATION stage was what was bringing the cluster down, not the ExpiringMap; 
and this was in a small cluster.

If anything, I suspect our ability to prune these messages is also 
theoretically worse, on top of this practical datapoint, because it is done on 
dequeue, whereas the ExpiringMap (whilst having a slightly longer expiry) is 
done asynchronously and cannot be blocked by e.g. flush.

The coordinator is also on the "right side" of the equation: as the cluster 
grows, any single problems should spread out to the coordinators more slowly, 
whereas the coordinator's ability to flood a processing node scales up at the 
same (well, inverted) rate.

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 2.1.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to