[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368604#comment-15368604
 ] 

Jason Brown commented on CASSANDRA-8457:
----------------------------------------

Here's the first pass at switching internode messaging to netty.
||8457||
|[branch|https://github.com/jasobrown/cassandra/tree/8457]|
|[dtest|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-8457-dtest/]|
|[testall|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-8457-testall/]|

I've tried to preserve all of the functionality/behaviors of the existing 
implementation as I could, and some aspects were a bit tricky in the 
non-blocking IO, netty world. I've also extensively documented the code as much 
as I can, and I still want to add more high-level docs on 1) the internode 
protocol itself, and 2) the use of netty in internode messaging. Hopefully the 
current state of documentation helps understanding and reviewing the changes. 
Here's some high-level notes as points of departure/interest/discussion:

- I've left the existing {{OutboundTcpConnection}} code largely intact for the 
short term (read on for more detail). But mostly the new and existing behaviors 
and coexist in the code together (though not at run time)
- There is a yaml property to enable/disable using netty for internode 
messaging. If disabled, we'll fall back to the existing 
{{OutboundTcpConnection}} code. Part of this stems from the fact that streaming 
also uses the same the socket infrastructure as internode messaging handshake 
as messaging, and streaming would be broken without the 
{{OutboundTcpConnection}} implementation. I am knees deep in switching 
streaming over to a non-blocking, netty-based solution, but that is a separate 
ticket/body of work.
- In order to support non-blocking IO, I've altered the internode messaging 
protocol such that each message is framed, and the frame contains a message 
size. The protocol change is what forces these changes to happen at a major rev 
update, hence 4.0
- Backward compatibility - We will need to handle the case of cluster upgrade 
where some nodes are on the previous version of the protocol (not upgraded), 
and some are upgraded. The upgraded nodes will still need to behave and operate 
correctly with the older nodes, and that functionality is encapsulated and 
documented in {{LegacyClientHandler}} (for the receive side) and 
{{MessageOutHandler}} for the send side.
- Message coalescing - The existing behaviors in {{CoalescingStrategies}} are 
predicated on parking the thread to allow outbound messages to arrive (and be 
coalesced). Parking a thread in a non-blocking/netty context is a bad thing, so 
I've inverted the behavior of message coalescing a bit. Instead of blocking a 
thread, I've extended the {{CoalescingStrategies.CoalescingStrategy}} 
implementations to return a 'time to wait' to left messages arrive for sending. 
I then schedule a task in the netty scheduler to execute that many nanoseconds 
in the future, queuing up incoming message, and then send them out when the 
scheduled task executes (this is {{CoalescingMessageOutHandler}}). I've also 
added callback functions to {{CoalescingStrategies.CoalescingStrategy}} 
implementations for the non-blocking paradigm to record updates to the strategy 
(for recalculation of the time window, etc).
- Message flushing - Currently in {{OutboundTcpConnection}}, we only call flush 
on the output stream if the backlog is empty (there's no more messages to send 
to the peer). Unfortunately there's no equivalent API in netty to know there's 
any messages in the channel waiting to be sent. The solution that I've gone 
with is to have a shared counter outside of the channel 
{{InternodeMessagingConnection#outboundCount}} and inside the channel 
{{CoalescingMessageOutHandler#outboundCounter}}, and when 
{{CoalescingMessageOutHandler}} sees the value of that counter is zero, it 
knows it can explicitly call flush. I'm not entirely thrilled with this 
approach, and there's some potential race/correctness problems (and 
complexity!) when reconnections occur, so I'm open to suggestions on how to 
achieve this functionality.
- I've included support for netty's OpenSSL library. The operator will need to 
deploy an extra netty jar (http://netty.io/wiki/forked-tomcat-native.html) to 
get the OpenSSL behavior (I'm not sure if we can or want to include it in our 
distro). {{SSLFactory}} needed to be refactored a bit to support the OpenSSL 
functionality.

I'll be doing some more extensive testing next week (including a more thorough 
exploration of the backward compatibility).

> nio MessagingService
> --------------------
>
>                 Key: CASSANDRA-8457
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Priority: Minor
>              Labels: netty, performance
>             Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to