[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14604649#comment-14604649
 ] 

Jonathan Ellis edited comment on CASSANDRA-9318 at 6/28/15 12:08 PM:
---------------------------------------------------------------------

Here's where I've ended up:

# Continuing to accept writes faster than a coordinator can deliver them to 
replicas is bad.  Even perfect load shedding is worse from a client perspective 
than throttling, since if we load shed and time out the client needs to try to 
guess the "right" rate to retry at.
# For the same reason, accepting a write but then refusing it with 
UnavailableException is worse than waiting to accept the write until we have 
capacity for it.
# It's more important to throttle writes because while we can get in trouble 
with large reads too (a small request turns into a big reply), in practice 
reads are naturally throttled because a client needs to wait for the read 
before taking action on it.  With writes on the other hand a new user's first 
inclination is to see how fast s/he can bulk load stuff.

In practice, I see load shedding and throttling as complementary.  Replicas can 
continue to rely on load shedding.  Perhaps we can attempt distributed back 
pressure later (if every replica is overloaded, we should again throttle 
clients) but for now let's narrow our scope to throttling clients to the 
capacity of a coordinator to send out.

*I propose we define a limit on the amount of memory MessagingService can 
consume and pause reading additional requests whenever that limit is hit.*  
Note that:

# If MS's load is distributed evenly across all destinations then this is 
trivially the right thing to do.
# If MS's load is caused by a single replica falling over or unable to keep up, 
this is still the right thing to do because the alternative is worse.  MS will 
load shed timed out requests, but if clients are sending more requests to a 
single replica than we can shed (if rate * timeout > capacity) then we still 
need to throttle or we will exhaust the heap and fall over.  

(The hint-based UnavailableException tries to help with scenario 2, and I will 
open a ticket to test how well that actually works.  But the hint threshold 
cannot help with scenario 1 at all and that is the hole this ticket needs to 
plug.)


was (Author: jbellis):
Here's where I've ended up:

# Continuing to accept writes faster than a coordinator can deliver them to 
replicas is bad.  Even perfect load shedding is worse from a client perspective 
than throttling, since if we load shed and time out the client needs to try to 
guess the "right" rate to retry at.
# For the same reason, accepting a write but then refusing it with 
UnavailableException is worse than waiting to accept the write until we have 
capacity for it.
# It's more important to throttle writes because while we can get in trouble 
with large reads too (a small request turns into a big reply), in practice 
reads are naturally throttled because a client needs to wait for the read 
before taking action on it.  With writes on the other hand a new user's first 
inclination is to see how fast s/he can bulk load stuff.

In practice, I see load shedding and throttling as complementary.  Replicas can 
continue to rely on load shedding.  Perhaps we can attempt distributed back 
pressure later (if every replica is overloaded, we should again throttle 
clients) but for now let's narrow our scope to throttling clients to the 
capacity of a coordinator to send out.

I propose we define a limit on the amount of memory MessagingService can 
consume and pause reading additional requests whenever that limit is hit.  Note 
that:

# If MS's load is distributed evenly across all destinations then this is 
trivially the right thing to do.
# If MS's load is caused by a single replica falling over or unable to keep up, 
this is still the right thing to do because the alternative is worse.  MS will 
load shed timed out requests, but if clients are sending more requests to a 
single replica than we can shed (if rate * timeout > capacity) then we still 
need to throttle or we will exhaust the heap and fall over.  

(The hint-based UnavailableException tries to help with scenario 2, and I will 
open a ticket to test how well that actually works.  But the hint threshold 
cannot help with scenario 1 at all and that is the hole this ticket needs to 
plug.)

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to