[
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14534260#comment-14534260
]
Benedict commented on CASSANDRA-9318:
-------------------------------------
bq. In my mind in-flight means that until the response is sent back to the
client the request counts against the in-flight limit.
bq. We already keep the original request around until we either get acks from
all replicas, or they time out (and we write a hint).
Perhaps we need to clarify more clearly what this ticket is proposing. I
understood it to mean what Ariel commented here, whereas it sounds like
Jonathan is suggesting we simply prune our ExpiringMap based on bytes tracked
as well as time?
The ExpiringMap requests are already "in-flight" and cannot be cancelled, so
their effect on other nodes cannot be rescinded, and imposing a limit does not
stop us issuing more requests to the nodes in the cluster that are failing to
keep up and respond to us. It _might_ be sensible if introducing more
aggressive shedding at processing nodes to also shed the response handlers more
aggressively locally, but I'm not convinced it would have a significant impact
on cluster health by itself; cluster instability spreads out from problematic
nodes, and this scheme still permits us to inundate those nodes with requests
they cannot keep up with.
Alternatively, the approach of forbidding new requests if you have items in the
ExpiringMap causes the collapse of other nodes to spread throughout the
cluster, as rapidly (especially on small clusters) all requests on the system
are destined for the collapsing node, and every coordinator stops accepting
requests. The system seizes up, and that node is still failing since it's got
requests from the entire cluster queued up with it. In general on the
coordinator there's no way of distinguishing between a failed node, network
partition, or just struggling, so we don't know if we should wait.
Some mix of the two might be possible, if we were to wait while a node is just
slow, then drop our response handlers for the node if it's marked as down. This
latter may not be a bad thing to do anyway, but I would not want to depend on
this behaviour to maintain our precious "A"
It still seems the simplest and most robust solution is to make our work queues
leaky, since this insulates the processing nodes from cluster-wide inundation,
which the coordinator approach cannot (even with the loss of "A" and cessation
of processing, there is a whole cluster vs a potentially single node; with hash
partition it doesn't take long for all processing to begin involving a single
failing node). We can do this just on the number of requests and still be much
better than we are currently.
We could also pair this with coordinator-level dropping of handlers for "down"
nodes, and above a threshold. This latter, though, could result in widespread
uncoordinated dropping of requests, which may leave us open to a multiplying
effect of cluster overload, with each node dropping different requests,
possibly leading to only a tiny fraction of requests being serviced to their
required CL across the cluster. I'm not sure how we can best model this risk,
or avoid it without notifying coordinators of the drop of a message, and I
don't see that being delivered for 2.1
> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Ariel Weisberg
> Assignee: Ariel Weisberg
> Fix For: 2.1.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding
> bytes and requests and if it reaches a high watermark disable read on client
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't
> introduce other issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)