[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054761#comment-15054761
 ] 

Ariel Weisberg commented on CASSANDRA-9318:
-------------------------------------------

I got two cstar jobs to complete.

[This job is set to allow 16 megabytes of transactions per coordinator, and 
disabled reads until they come back down to 12 
megabytes.|http://cstar.datastax.com/graph?command=one_job&stats=d1e720c8-a125-11e5-9051-0256e416528f&metric=op_rate&operation=1_write&smoothing=1&show_aggregates=true&xmin=0&xmax=6664.35&ymin=0&ymax=11883.3]
[This job is set to allow 64 megabytes of transactions per coordinator, and 
disabled reads until they came back down to 60 
megabytes.|http://cstar.datastax.com/graph?command=one_job&stats=26853362-a127-11e5-80c2-0256e416528f&metric=op_rate&operation=1_write&smoothing=1&show_aggregates=true&xmin=0&xmax=322.85&ymin=0&ymax=12972.3]

The job with 64 megabytes in flight kind of looks like it failed after 300 
seconds. I didn't expect the threshold for things to fall apart to be quite 
that low, but generally speaking yeah more data in flight tends to cause bad 
things to happen.

So why did the second one fall apart? First off mad props to whomever started 
collecting the GC logs. Lot's of continual full GC at the end. Sure enough the 
heap is only 1 gigabyte. Are we seriously running all our performance tests 
with a default heap of 1 gigabyte?

I don't think it failed due to in flight requests (only had 32 megabytes in 
flight). I think it up OOMed due to other heap pressure. For this in-flight 
request backpressure to work I think we need to include the weight of memtables 
when making the decision. I am going to bump up the heap and try again to see 
if I can reduce the impact of other heap pressure to the point that we can 
start buffering more requests in flight.

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths, Streaming and Messaging
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to