[
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344958#comment-15344958
]
Sergio Bossa commented on CASSANDRA-9318:
-----------------------------------------
I would like to reopen this ticket and propose the following patch to implement
coordinator-based back-pressure:
| [3.0
patch|https://github.com/apache/cassandra/compare/cassandra-3.0...sbtourist:CASSANDRA-9318-3.0?expand=1]
|
[testall|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-3.0-testall/]
|
[dtest|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-3.0-dtest/]
|
The above patch provides full-blown, end-to-end,
replica-to-coordinator-to-client *write* back-pressure, based on the following
main concepts:
* The replica itself has no back-pressure knowledge: it keeps trying to write
mutations as fast as possible, and still applies load shedding.
* The coordinator tracks the back-pressure state *per replica*, which in the
current implementation consists of the incoming and outgoing rate of messages
from/to the replica.
* The coordinator is configured with a back-pressure strategy that based on the
back-pressure state, applies a given back-pressure algorithm when sending
mutations to each replica.
* The provided default strategy is based on the incoming/outgoing message
ratio, used to rate limit outgoing messages towards a given replica.
* The back-pressure strategy is also in charge of signalling the coordinator
when a given replica is considered "overloaded", in which case an
{{OverloadedException}} is thrown to the client for all mutation requests
deemed as "overloading", until the strategy considers such overloaded state
over.
* The provided default strategy uses configurable low/high thresholds to either
rate limit or throw exception back to clients.
While all of that might seem too complex, the patch is actually surprisingly
simple. I provided as many unit tests as possible, and I've also tested it on a
2-nodes CCM cluster, using [ByteMan|http://byteman.jboss.org/] to simulate a
slow replica, and I'd say results are quite promising: as an example, see
attached ByteMan script and plots showing a cluster with no back-pressure
ending up dropping ~200k mutations, while a cluster with back-pressure enabled
only ~2k, which means less coordinator overload and an easily recoverable
replica state via hints.
I can foresee at least two open points:
* We might want to track more "back-pressure state" to allow implementing
different strategies; I personally believe strategies based on in/out rates are
the most appropriate ones to avoid *both* the overloading and dropped mutations
problems, but people might think differently.
* When the {{OverloadedException}} is (eventually) thrown, some requests might
have been already sent, which is exactly what currently happens with hint
overloading too: we might want to check both kinds of overloading before
actually sending any mutations to replicas.
Thoughts?
> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
> Issue Type: Improvement
> Components: Local Write-Read Paths, Streaming and Messaging
> Reporter: Ariel Weisberg
> Assignee: Ariel Weisberg
>
> It's possible to somewhat bound the amount of load accepted into the cluster
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding
> bytes and requests and if it reaches a high watermark disable read on client
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't
> introduce other issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)