[
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054761#comment-15054761
]
Ariel Weisberg commented on CASSANDRA-9318:
-------------------------------------------
I got two cstar jobs to complete.
[This job is set to allow 16 megabytes of transactions per coordinator, and
disabled reads until they come back down to 12
megabytes.|http://cstar.datastax.com/graph?command=one_job&stats=d1e720c8-a125-11e5-9051-0256e416528f&metric=op_rate&operation=1_write&smoothing=1&show_aggregates=true&xmin=0&xmax=6664.35&ymin=0&ymax=11883.3]
[This job is set to allow 64 megabytes of transactions per coordinator, and
disabled reads until they came back down to 60
megabytes.|http://cstar.datastax.com/graph?command=one_job&stats=26853362-a127-11e5-80c2-0256e416528f&metric=op_rate&operation=1_write&smoothing=1&show_aggregates=true&xmin=0&xmax=322.85&ymin=0&ymax=12972.3]
The job with 64 megabytes in flight kind of looks like it failed after 300
seconds. I didn't expect the threshold for things to fall apart to be quite
that low, but generally speaking yeah more data in flight tends to cause bad
things to happen.
So why did the second one fall apart? First off mad props to whomever started
collecting the GC logs. Lot's of continual full GC at the end. Sure enough the
heap is only 1 gigabyte. Are we seriously running all our performance tests
with a default heap of 1 gigabyte?
I don't think it failed due to in flight requests (only had 32 megabytes in
flight). I think it up OOMed due to other heap pressure. For this in-flight
request backpressure to work I think we need to include the weight of memtables
when making the decision. I am going to bump up the heap and try again to see
if I can reduce the impact of other heap pressure to the point that we can
start buffering more requests in flight.
> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
> Issue Type: Improvement
> Components: Local Write-Read Paths, Streaming and Messaging
> Reporter: Ariel Weisberg
> Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding
> bytes and requests and if it reaches a high watermark disable read on client
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't
> introduce other issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)