[
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535369#comment-14535369
]
Benedict commented on CASSANDRA-9318:
-------------------------------------
bq. because our existing load shedding is fine at recovering from temporary
spikes in load
Are you certain? The recent testing Ariel did on CASSANDRA-8670 demonstrated
the MUTATION stage was what was bringing the cluster down, not the ExpiringMap;
and this was in a small cluster.
If anything, I suspect our ability to prune these messages is also
theoretically worse, on top of this practical datapoint, because it is done on
dequeue, whereas the ExpiringMap (whilst having a slightly longer expiry) is
done asynchronously and cannot be blocked by e.g. flush.
The coordinator is also on the "right side" of the equation: as the cluster
grows, any single problems should spread out to the coordinators more slowly,
whereas the coordinator's ability to flood a processing node scales up at the
same (well, inverted) rate.
> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Ariel Weisberg
> Assignee: Ariel Weisberg
> Fix For: 2.1.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding
> bytes and requests and if it reaches a high watermark disable read on client
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't
> introduce other issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)