[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

Jonathan Ellis (JIRA) Tue, 12 May 2015 08:11:36 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539990#comment-14539990
 ]


Jonathan Ellis commented on CASSANDRA-9318:
-------------------------------------------

bq. the cluster as a whole, especially in large clusters, can still send a lot 
of requests to a single node

Yes.  But I still think this is useful:

# If you hit OE while bulk loading into your initial cluster of dozens of 
nodes, you will remember that lesson as you grow into a large one of hundreds
# Even if we can't keep a replica from needing to load shed after a GC pause, 
we can keep the *coordinator* from falling over which is what turns a 
single-node hiccup into a cluster-wide problem.

bq. it has the opposite impact of (and likely prevents) CASSANDRA-3852, with 
older operations completely blocking newer ones

IMO avoiding accepting requests that we won't be able to process is more ideal 
from a user's perspective anyway.
 
bq. it might mean a lot more OE than users are used to during temporary blips, 
pushing problems down to clients, when the cluster is actually quite capable of 
coping (through hinting).  It seems like this would in fact seriously 
compromise our "A" property, with any failure for any node in a token range 
rapidly making the entire token range unavailable for writes*

Yes, this is something to be cautious about.  Which is why the existing design 
thinks of the problem as, "Don't accept more requests than I can compensate for 
with hints" but it doesn't look like that's aggressive enough.  I think there 
is room to improve without swinging too far in the other direction.

bq. I'm wary about a feature like this, when we could simply improve our 
current work shedding to make it more robust (MessagingService, MUTATION stage 
and ExpiringMap all, effectively, shed; just not with sufficient 
predictability).

The problem is that we need to give the clients better feedback so they know to 
modify their behavior.  Improving load shedding doesn't help with that if the 
user isn't sufficiently familiar with C* internals.

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

Reply via email to