[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053719#comment-15053719
 ] 

Ariel Weisberg commented on CASSANDRA-9318:
-------------------------------------------

I tried out limiting based on memory utilization. What I found was that for the 
small amount of throughput I can get out of my desktop the 2 second timeout is 
sufficient to evict and hint without OOM. If I extend the timeout enough I can 
get an OOM because eviction doesn't keep up so that demonstrates that eviction 
has to take place to avoid OOM.

I see this change as useful in that it places a hard bound on working set size, 
but not sufficient.

Performance tanks so badly as the heap fills up with requests being timed out 
that evicting them is not a problem. If that is the case maybe we should just 
be evicting them more aggressively so they don't have an impact on performance. 
Possibly based on perceived odds of receiving a response in a reasonable amount 
of time. It makes sense to me to use hinting as a method of getting the data 
off the heap and batching the replication to slow nodes or DCs.

If we start evicting requests for a node maybe we should have an adaptive 
approach and go straight to hinting for the slow node for some % of requests. 
If the non-immediately hinted requests start succeeding we can gradually 
increase the % that go straight to the node.

I am trying to think of alternatives that don't end up kicking back an error to 
the application. That's still an important capability to have because growing 
hints forever is not great, but we can start by ensuring that rest of the 
cluster can always operate at full speed even if a node is slow. Separately we 
can tackle bounding the resource utilization issues that presents.

Operationally how do people feel about having many gigabytes worth of hints to 
deliver? Is that useful in that it allows things to continue until the slow 
node is addressed?

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths, Streaming and Messaging
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to