[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

Jonathan Ellis (JIRA) Tue, 12 Jul 2016 12:24:41 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15373512#comment-15373512
 ]


Jonathan Ellis edited comment on CASSANDRA-9318 at 7/12/16 7:23 PM:
--------------------------------------------------------------------

The more I think about it the more I think the entire approach may be a bad fit 
for Cassandra.  Consider:

# If a node has a "hiccup" of slow performance, e.g. due to a GC pause, we want 
to hint those writes and return success to the client.  No need to rate limit.
# If a node has a sustained period of slow performance, we want to hint those 
writes and return success to the client.  No need to rate limit, unless we are 
overwhelmed with hints.  (Not sure if hint overload is actually a problem with 
the new file based hints.)
# Where we DO want to rate limit is when the client is throwing more updates at 
the coordinator than the system can handle, whether that is for a single token 
range or globally across all nodes.

So I see this approach as doing the wrong thing for 1 and 2 and only partially 
helping with 3.

Put another way: we do NOT want to limit performance to the slowest node in a 
set of replicas.  That is kind of the opposite of the redundancy we want to 
provide.


was (Author: jbellis):
The more I think about it the more I think the entire approach may be a bad fit 
for Cassandra.  Consider:

# If a node has a "hiccup" of slow performance, e.g. due to a GC pause, we want 
to hint those writes and return success to the client.  No need to rate limit.
# If a node has a sustained period of slow performance, we want to hint those 
writes and return success to the client.  No need to rate limit, unless we are 
overwhelmed with hints.  (Not sure if hint overload is actually a problem with 
the new file based hints.)
# Where we DO want to rate limit is when the client is throwing more updates at 
the coordinator than the system can handle, whether that is for a single token 
range or globally across many nodes.

So I see this approach as doing the wrong thing for 1 and 2 and only partially 
helping with 3.

Put another way: we do NOT want to limit performance to the slowest node in a 
set of replicas.  That is kind of the opposite of the redundancy we want to 
provide.

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths, Streaming and Messaging
>            Reporter: Ariel Weisberg
>            Assignee: Sergio Bossa
>         Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

Reply via email to