[
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15366583#comment-15366583
]
Sergio Bossa commented on CASSANDRA-9318:
-----------------------------------------
[~Stefania], [~slebresne],
I've pushed a few more commits to address your concerns.
First of all, I've got rid of the back-pressure timeout: the back-pressure
window for the rate-based algorithm is now equal to the write timeout, and the
overall implementation has been improved to better track in/out rates and avoid
the need of a larger window; more specifically, the rates are now tracked
together when either a response is received or the callback is expired, which
avoids edge cases causing an unbalanced in/out rate when a burst of outgoing
messages is recorded on the edge of a window.
Also, I've abstracted {{BackPressureState}} into an interface as requested.
Configuration-wise, we're now left with only the {{back_pressure_enabled}}
boolean and the {{back_pressure_strategy}}, and I'd really like to keep the
former, as it makes way easier to dynamically turn the back-pressure on/off.
Talking about the overloaded state and the usage of {{OverloadedException}}, I
agree the latter might be misleading, and I agree some failure conditions could
lead to requests being wrongly refused, but I'd also like to keep some form of
"emergency" feedback towards the client: what about throwing OE only if _all_
(or a given number depending on the CL?) replicas are overloaded?
Regarding when and how to ship this, I'm fine with trunk and I agree it should
be off by default for now.
Finally, one more wild idea to consider: given this patch greatly reduces the
number of dropped mutations, and hence the number of inflight hints, what do
you think about disabling load shedding by the replica side when back-pressure
is enabled? This way we'd trade "full consistency" for an hopefully smaller
number of unnecessary hints sent over to "pressured" replicas when their
callbacks expire by the coordinator side.
> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
> Issue Type: Improvement
> Components: Local Write-Read Paths, Streaming and Messaging
> Reporter: Ariel Weisberg
> Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png,
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding
> bytes and requests and if it reaches a high watermark disable read on client
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't
> introduce other issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)