Wei Deng created CASSANDRA-11380:
------------------------------------

             Summary: Client visible backpressure mechanism
                 Key: CASSANDRA-11380
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11380
             Project: Cassandra
          Issue Type: New Feature
          Components: Coordination
            Reporter: Wei Deng


Cassandra currently lacks a sophisticated back pressure mechanism to prevent 
clients ingesting data at too high throughput. One of the reasons why it hasn't 
done so is because of its SEDA (Staged Event Driven Architecture) design. With 
SEDA, an overloaded thread pool can drop those droppable messages (in this 
case, MutationStage can drop mutation or counter mutation messages) when they 
exceed the 2-second timeout. This can save the JVM from running out of memory 
and crash. However, one downside from this kind of load-shedding based 
backpressure approach is that increased number of dropped mutations will 
increase the chance of inconsistency among replicas and will likely require 
more repair (hints can help to some extent, but it's not designed to cover all 
inconsistencies); another downside is that excessive writes will also introduce 
much more pressure on compaction (especially LCS),  and backlogged compaction 
will increase read latency and cause more frequent GC pauses, and depending on 
the type of compaction, some backlog can take a long time to clear up even 
after the write is removed. It seems that the current load-shedding mechanism 
is not adequate to address a common bulk loading scenario, where clients are 
trying to ingest data at highest throughput possible. We need a more direct way 
to tell the client drivers to slow down.

It appears that HBase had suffered similar situation as discussed in 
HBASE-5162, and they introduced some special exception type to tell the client 
to slow down when a certain "overloaded" criteria is met. If we can leverage a 
similar mechanism, our dropped mutation event can be used to trigger such 
exceptions to push back on the client; at the same time, backlogged compaction 
(when the number of pending compactions exceeds a certain threshold) can also 
be used for the push back and this can prevent vicious cycle mentioned in 
https://issues.apache.org/jira/browse/CASSANDRA-11366?focusedCommentId=15198786&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15198786.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to