[
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053719#comment-15053719
]
Ariel Weisberg commented on CASSANDRA-9318:
-------------------------------------------
I tried out limiting based on memory utilization. What I found was that for the
small amount of throughput I can get out of my desktop the 2 second timeout is
sufficient to evict and hint without OOM. If I extend the timeout enough I can
get an OOM because eviction doesn't keep up so that demonstrates that eviction
has to take place to avoid OOM.
I see this change as useful in that it places a hard bound on working set size,
but not sufficient.
Performance tanks so badly as the heap fills up with requests being timed out
that evicting them is not a problem. If that is the case maybe we should just
be evicting them more aggressively so they don't have an impact on performance.
Possibly based on perceived odds of receiving a response in a reasonable amount
of time. It makes sense to me to use hinting as a method of getting the data
off the heap and batching the replication to slow nodes or DCs.
If we start evicting requests for a node maybe we should have an adaptive
approach and go straight to hinting for the slow node for some % of requests.
If the non-immediately hinted requests start succeeding we can gradually
increase the % that go straight to the node.
I am trying to think of alternatives that don't end up kicking back an error to
the application. That's still an important capability to have because growing
hints forever is not great, but we can start by ensuring that rest of the
cluster can always operate at full speed even if a node is slow. Separately we
can tackle bounding the resource utilization issues that presents.
Operationally how do people feel about having many gigabytes worth of hints to
deliver? Is that useful in that it allows things to continue until the slow
node is addressed?
> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
> Issue Type: Improvement
> Components: Local Write-Read Paths, Streaming and Messaging
> Reporter: Ariel Weisberg
> Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding
> bytes and requests and if it reaches a high watermark disable read on client
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't
> introduce other issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)