[
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539990#comment-14539990
]
Jonathan Ellis commented on CASSANDRA-9318:
-------------------------------------------
bq. the cluster as a whole, especially in large clusters, can still send a lot
of requests to a single node
Yes. But I still think this is useful:
# If you hit OE while bulk loading into your initial cluster of dozens of
nodes, you will remember that lesson as you grow into a large one of hundreds
# Even if we can't keep a replica from needing to load shed after a GC pause,
we can keep the *coordinator* from falling over which is what turns a
single-node hiccup into a cluster-wide problem.
bq. it has the opposite impact of (and likely prevents) CASSANDRA-3852, with
older operations completely blocking newer ones
IMO avoiding accepting requests that we won't be able to process is more ideal
from a user's perspective anyway.
bq. it might mean a lot more OE than users are used to during temporary blips,
pushing problems down to clients, when the cluster is actually quite capable of
coping (through hinting). It seems like this would in fact seriously
compromise our "A" property, with any failure for any node in a token range
rapidly making the entire token range unavailable for writes*
Yes, this is something to be cautious about. Which is why the existing design
thinks of the problem as, "Don't accept more requests than I can compensate for
with hints" but it doesn't look like that's aggressive enough. I think there
is room to improve without swinging too far in the other direction.
bq. I'm wary about a feature like this, when we could simply improve our
current work shedding to make it more robust (MessagingService, MUTATION stage
and ExpiringMap all, effectively, shed; just not with sufficient
predictability).
The problem is that we need to give the clients better feedback so they know to
modify their behavior. Improving load shedding doesn't help with that if the
user isn't sufficiently familiar with C* internals.
> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Ariel Weisberg
> Assignee: Ariel Weisberg
> Fix For: 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding
> bytes and requests and if it reaches a high watermark disable read on client
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't
> introduce other issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)