[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15366583#comment-15366583
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-----------------------------------------

[~Stefania], [~slebresne],

I've pushed a few more commits to address your concerns.

First of all, I've got rid of the back-pressure timeout: the back-pressure 
window for the rate-based algorithm is now equal to the write timeout, and the 
overall implementation has been improved to better track in/out rates and avoid 
the need of a larger window; more specifically, the rates are now tracked 
together when either a response is received or the callback is expired, which 
avoids edge cases causing an unbalanced in/out rate when a burst of outgoing 
messages is recorded on the edge of a window.

Also, I've abstracted {{BackPressureState}} into an interface as requested.

Configuration-wise, we're now left with only the {{back_pressure_enabled}} 
boolean and the {{back_pressure_strategy}}, and I'd really like to keep the 
former, as it makes way easier to dynamically turn the back-pressure on/off.

Talking about the overloaded state and the usage of {{OverloadedException}}, I 
agree the latter might be misleading, and I agree some failure conditions could 
lead to requests being wrongly refused, but I'd also like to keep some form of 
"emergency" feedback towards the client: what about throwing OE only if _all_ 
(or a given number depending on the CL?) replicas are overloaded?

Regarding when and how to ship this, I'm fine with trunk and I agree it should 
be off by default for now.

Finally, one more wild idea to consider: given this patch greatly reduces the 
number of dropped mutations, and hence the number of inflight hints, what do 
you think about disabling load shedding by the replica side when back-pressure 
is enabled? This way we'd trade "full consistency" for an hopefully smaller 
number of unnecessary hints sent over to "pressured" replicas when their 
callbacks expire by the coordinator side.

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths, Streaming and Messaging
>            Reporter: Ariel Weisberg
>            Assignee: Sergio Bossa
>         Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to