[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344958#comment-15344958
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-----------------------------------------

I would like to reopen this ticket and propose the following patch to implement 
coordinator-based back-pressure:

| [3.0 
patch|https://github.com/apache/cassandra/compare/cassandra-3.0...sbtourist:CASSANDRA-9318-3.0?expand=1]
 | 
[testall|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-3.0-testall/]
 | 
[dtest|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-3.0-dtest/]
 |

The above patch provides full-blown, end-to-end, 
replica-to-coordinator-to-client *write* back-pressure, based on the following 
main concepts:
* The replica itself has no back-pressure knowledge: it keeps trying to write 
mutations as fast as possible, and still applies load shedding.
* The coordinator tracks the back-pressure state *per replica*, which in the 
current implementation consists of the incoming and outgoing rate of messages 
from/to the replica.
* The coordinator is configured with a back-pressure strategy that based on the 
back-pressure state, applies a given back-pressure algorithm when sending 
mutations to each replica.
* The provided default strategy is based on the incoming/outgoing message 
ratio, used to rate limit outgoing messages towards a given replica.
* The back-pressure strategy is also in charge of signalling the coordinator 
when a given replica is considered "overloaded", in which case an 
{{OverloadedException}} is thrown to the client for all mutation requests 
deemed as "overloading", until the strategy considers such overloaded state 
over.
* The provided default strategy uses configurable low/high thresholds to either 
rate limit or throw exception back to clients.

While all of that might seem too complex, the patch is actually surprisingly 
simple. I provided as many unit tests as possible, and I've also tested it on a 
2-nodes CCM cluster, using [ByteMan|http://byteman.jboss.org/] to simulate a 
slow replica, and I'd say results are quite promising: as an example, see 
attached ByteMan script and plots showing a cluster with no back-pressure 
ending up dropping ~200k mutations, while a cluster with back-pressure enabled 
only ~2k, which means less coordinator overload and an easily recoverable 
replica state via hints.

I can foresee at least two open points:
* We might want to track more "back-pressure state" to allow implementing 
different strategies; I personally believe strategies based on in/out rates are 
the most appropriate ones to avoid *both* the overloading and dropped mutations 
problems, but people might think differently.
* When the {{OverloadedException}} is (eventually) thrown, some requests might 
have been already sent, which is exactly what currently happens with hint 
overloading too: we might want to check both kinds of overloading before 
actually sending any mutations to replicas.

Thoughts?

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths, Streaming and Messaging
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to