[
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535356#comment-14535356
]
Jonathan Ellis commented on CASSANDRA-9318:
-------------------------------------------
bq. it sounds like Jonathan is suggesting we simply prune our ExpiringMap based
on bytes tracked as well as time?
No, I'm suggesting we abort requests more aggressively with OverloadedException
*before sending them to replicas*. One place this might make sense is
sendToHintedEndpoints, where we already throw OE.
Right now we only throw OE once we start writing hints for a node that is in
trouble. This doesn't seem to be aggressive enough. (Although, most of our
users are on 2.0 where we allowed 8x as many hints in flight before starting to
throttle.)
So, I am suggesting we also track requests outstanding (perhaps with the
ExpiringMap as you suggest) as well and stop accepting requests once we hit a
reasonable limit of "you can't possibly process more requests than this in
parallel."
> The ExpiringMap requests are already "in-flight" and cannot be cancelled, so
> their effect on other nodes cannot be rescinded, and imposing a limit does
> not stop us issuing more requests to the nodes in the cluster that are
> failing to keep up and respond to us.
Right, and I'm fine with that. The goal is not to keep the replica completely
out of trouble. The goal is to keep the coordinator from falling over from
buffering EM and MessagingService entries that it can't drain fast enough.
Secondarily, this will help the replica too because our existing load shedding
is fine at recovering from temporary spikes in load. But our load shedding
isn't good enough to save it when the coordinators keep throwing more at it
when it's already overwhelmed.
> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Ariel Weisberg
> Assignee: Ariel Weisberg
> Fix For: 2.1.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding
> bytes and requests and if it reaches a high watermark disable read on client
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't
> introduce other issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)