[
https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803613#action_12803613
]
Jonathan Ellis commented on CASSANDRA-685:
------------------------------------------
We either need to (a) have a separate deserialization queue for "reply" traffic
(we could use one of the "header" bits that isn't part of the Message proper to
control this), or (b) drop messages for overloaded states on the floor so the
deserializer doesn't overload, or (c) we need to give up the command/reply
division entirely.
Alternatively, option (b) reminds me that instead of "backpressure" we could
just "timeoutpressure," where instead of overloaded stages backpressuring
message deserializer backpressuring socket reads, the deserializer can just
discard messages the system is too busy to handle. The downside is, it will
take an extra rpc_timeout latency before the clients start to get timeouts.
The upside is, as things unclog the messages that get processed will be fresh
ones, so we are less likely to waste work processing messages that the client
isn't even waiting for anymore.
Also, I'd like to dynamically adjust stage capacity based on the amount of work
that gets processed, rather than have a fixed value that has to be manually
tuned. Not sure what that would look like -- none of the Java BlockingQueue
classes have adjustable capacity post-construction. But, stage enqueueing is
only done in one place (by the deserializer executor) so we can one-off
something if we have to.
> add backpressure to StorageProxy
> --------------------------------
>
> Key: CASSANDRA-685
> URL: https://issues.apache.org/jira/browse/CASSANDRA-685
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Priority: Minor
> Fix For: 0.6
>
> Attachments:
> 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt,
> 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we
> need to stop the target node from pulling mutations out of MessagingService
> as fast as it can only to take up space in the mutation queue and eventually
> fill up memory.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.