[
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605597#comment-14605597
]
Benedict commented on CASSANDRA-9318:
-------------------------------------
bq. default timeout is 2s not 10, so actually fine in your example of 300MB vs
150MB/s x 2s
Looks like 2.0 this was 10s, and it was hard-coded in yaml, so anyone upgrading
from 2.0 or before likely has a 10s timeout. So we should assume this is by far
the most common timeout.
bq. you don't see a complete halt until capacity's worth of requests timeout
all at once, because you don't get an entire capacity load accepted at once.
it's more continuous than discrete – you pause until the oldest expire, accept
more, pause until the oldest expire, etc. so you make slow progress as load
shedding can free up memory. thus, load shedding is complementary to flow
control.
You see a complete halt as soon as we exhaust space. If we exhaust space in <
0.5x timeout, then we will see repeatedly juddering behaviour.
bq. but we can easily set a higher limit on MS heap – maybe as high as 1/8 heap
as default which gives us a lot of room for 8GB heap
If we set this really _aggressively_ high, say min(1/4 heap, 1Gb) until we
implement the improved shedding, then I'll quit complaining. Right now we give
breathing room up to and beyond collapse. I absolutely agree that breathing
room up until just-prior-to-collapse is preferable, but cutting our breathing
room by a magnitude is reducing our availability in clusters without their
opting into it. 1/4 heap is probably still leaving quite a lot of headroom we
would otherwise have safely used in a 2Gb heap (which are quite feasible, and
probably preferable, for many users running offheap memtables), but is still
very unlikely to cause the server to completely collapse.
> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Ariel Weisberg
> Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding
> bytes and requests and if it reaches a high watermark disable read on client
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't
> introduce other issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)