[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605597#comment-14605597
 ] 

Benedict commented on CASSANDRA-9318:
-------------------------------------

bq. default timeout is 2s not 10, so actually fine in your example of 300MB vs 
150MB/s x 2s

Looks like 2.0 this was 10s, and it was hard-coded in yaml, so anyone upgrading 
from 2.0 or before likely has a 10s timeout. So we should assume this is by far 
the most common timeout.

bq. you don't see a complete halt until capacity's worth of requests timeout 
all at once, because you don't get an entire capacity load accepted at once. 
it's more continuous than discrete – you pause until the oldest expire, accept 
more, pause until the oldest expire, etc. so you make slow progress as load 
shedding can free up memory. thus, load shedding is complementary to flow 
control.

You see a complete halt as soon as we exhaust space. If we exhaust space in < 
0.5x timeout, then we will see repeatedly juddering behaviour.

bq. but we can easily set a higher limit on MS heap – maybe as high as 1/8 heap 
as default which gives us a lot of room for 8GB heap

If we set this really _aggressively_ high, say min(1/4 heap, 1Gb) until we 
implement the improved shedding, then I'll quit complaining. Right now we give 
breathing room up to and beyond collapse.  I absolutely agree that breathing 
room up until just-prior-to-collapse is preferable, but cutting our breathing 
room by a magnitude is reducing our availability in clusters without their 
opting into it. 1/4 heap is probably still leaving quite a lot of headroom we 
would otherwise have safely used in a 2Gb heap (which are quite feasible, and 
probably preferable, for many users running offheap memtables), but is still 
very unlikely to cause the server to completely collapse. 


> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to