[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

Benedict (JIRA) Sat, 09 May 2015 10:24:36 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536307#comment-14536307
 ]


Benedict edited comment on CASSANDRA-9318 at 5/9/15 5:23 PM:
-------------------------------------------------------------

bq. Where? Are you talking about the hint limit?

I was, and I realise that was a mistake; I didn't fully understand the existing 
logic (and your proposal took me by surprise). Now that I do, I think I 
understand what you are proposing. There are a few problems that I see with it, 
though:

# the cluster as a whole, especially in large clusters, can still send a _lot_ 
of requests to a single node
# it has the opposite impact of (and likely prevents) CASSANDRA-3852, with 
older operations completely blocking newer ones 
# it might mean a lot more OE than users are used to during temporary blips, 
pushing problems down to clients, when the cluster is actually quite capable of 
coping (through hinting)
#* It seems like this would in fact seriously compromise our "A" property, with 
any failure for any node in a token range rapidly making the entire token range 
unavailable for writes\*
# tuning it is hard; network latencies, query processing times, and cluster 
size (which changes over time) will each impact it

I'm wary about a feature like this, when we could simply improve our current 
work shedding to make it more robust (MessagingService, MUTATION stage and 
ExpiringMap all, effectively, shed; just not with sufficient predictability), 
but I think I've made all my concerns sufficiently clear so I'll leave it with 
you.

\* At the very least we would have to first fallback to hints, rather than 
throwing OE, and wait for hints to saturate before throwing (AFAICT). In which 
case we're _in effect_ introducing "LIFO-leaky" pruning of the ExpiringMap, MS, 
and the receiving node's MUTATION stage, but under a new mechanism (as opposed 
to inline FIFO? (tbd) pruning). I don't really have anything against this, 
since it is functionally equivalent, although I think FIFO-pruning is 
preferable; having fewer pruning mechanisms is probably preferable; these 
mechanisms would apply more universally; and they would insulate the node from 
the many-to-one effect (by making the MUTATION stage itself robust to overload).


was (Author: benedict):
bq. Where? Are you talking about the hint limit?

I was, and I realise that was a mistake; I didn't fully understand the existing 
logic (and your proposal took me by surprise). Now that I do, I think I 
understand what you are proposing. There are a few problems that I see with it, 
though:

# the cluster as a whole, especially in large clusters, can still send a _lot_ 
of requests to a single node
# it has the opposite impact of (and likely prevents) CASSANDRA-3852, with 
older operations completely blocking newer ones 
# it might mean a lot more OE than users are used to during temporary blips, 
pushing problems down to clients, when the cluster is actually quite capable of 
coping (through hinting)
# tuning it is hard; network latencies, query processing times, and cluster 
size (which changes over time) will each impact it

I'm wary about a feature like this, when we could simply improve our current 
work shedding to make it more robust (MessagingService, MUTATION stage and 
ExpiringMap all, effectively, shed; just not with sufficient predictability), 
but I think I've made all my concerns sufficiently clear so I'll leave it with 
you.

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 2.1.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

Reply via email to