[
https://issues.apache.org/jira/browse/CASSANDRA-11380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15554366#comment-15554366
]
Corentin Chary commented on CASSANDRA-11380:
--------------------------------------------
Looks like a good start. I'll try to test this with my workload and publish the
results. Thanks for the link.
> Client visible backpressure mechanism
> -------------------------------------
>
> Key: CASSANDRA-11380
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11380
> Project: Cassandra
> Issue Type: New Feature
> Components: Coordination
> Reporter: Wei Deng
>
> Cassandra currently lacks a sophisticated back pressure mechanism to prevent
> clients ingesting data at too high throughput. One of the reasons why it
> hasn't done so is because of its SEDA (Staged Event Driven Architecture)
> design. With SEDA, an overloaded thread pool can drop those droppable
> messages (in this case, MutationStage can drop mutation or counter mutation
> messages) when they exceed the 2-second timeout. This can save the JVM from
> running out of memory and crash. However, one downside from this kind of
> load-shedding based backpressure approach is that increased number of dropped
> mutations will increase the chance of inconsistency among replicas and will
> likely require more repair (hints can help to some extent, but it's not
> designed to cover all inconsistencies); another downside is that excessive
> writes will also introduce much more pressure on compaction (especially LCS),
> and backlogged compaction will increase read latency and cause more frequent
> GC pauses, and depending on the type of compaction, some backlog can take a
> long time to clear up even after the write is removed. It seems that the
> current load-shedding mechanism is not adequate to address a common bulk
> loading scenario, where clients are trying to ingest data at highest
> throughput possible. We need a more direct way to tell the client drivers to
> slow down.
> It appears that HBase had suffered similar situation as discussed in
> HBASE-5162, and they introduced some special exception type to tell the
> client to slow down when a certain "overloaded" criteria is met. If we can
> leverage a similar mechanism, our dropped mutation event can be used to
> trigger such exceptions to push back on the client; at the same time,
> backlogged compaction (when the number of pending compactions exceeds a
> certain threshold) can also be used for the push back and this can prevent
> vicious cycle mentioned in
> https://issues.apache.org/jira/browse/CASSANDRA-11366?focusedCommentId=15198786&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15198786.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)