[ https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968472#comment-15968472 ]
Jason Brown commented on CASSANDRA-8457: ---------------------------------------- So, [~aweisberg] and I spent some time talking offline about the expiring messages on the outbound side, and came up with the following: 1. run a periodic, scheduled task in each channel that checks to make sure the channel is making progess wrt sending bytes. If we fail to see any progress being made after some number of seconds, we should close the connection/socket and throw away the messages. 2. repurpose the high/low water mark (and arguably use it more correctly) to indicate when we should stop writing messages to the channel (at the {{ChannelWriter}} layer). Currently, I'm just using the water mark to indicate when we should flush, but a simple check elsewhere would accomplish the same thing. Instead, the water marks should indicate when we really shouldn't write to the channel anymore, and either queue up those messages in something like {{OutboundMessageConnection#backlog}} or perhaps drop them (I'd prefer to queue). 3. When we've exceeded the high water mark, we can disable the reading incoming messages from the same peer (achievable by disabiling auto read for the channel). This would prevent the current node from executing more work on behalf of a peer to which we cannot send any data. Then when the channel drops below the low water mark (and the channel is 'writable'), we re-enable netty auto read on the read channels for the pper. 1 and 2 are reasonably easy to do (and I'll do them asap), but I'd prefer to defer 3 until later as it has a lot of races and other complexities/subtleties I'd like to put off for the scope of this ticket (especially as sockets are not bidirectional yet). Thoughts? Note: items 1 & 2 are significantly simpler than my earlier comments wrt message expiration, so please disregard them for now. > nio MessagingService > -------------------- > > Key: CASSANDRA-8457 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8457 > Project: Cassandra > Issue Type: New Feature > Reporter: Jonathan Ellis > Assignee: Jason Brown > Priority: Minor > Labels: netty, performance > Fix For: 4.x > > > Thread-per-peer (actually two each incoming and outbound) is a big > contributor to context switching, especially for larger clusters. Let's look > at switching to nio, possibly via Netty. -- This message was sent by Atlassian JIRA (v6.3.15#6346)