[
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368604#comment-15368604
]
Jason Brown commented on CASSANDRA-8457:
----------------------------------------
Here's the first pass at switching internode messaging to netty.
||8457||
|[branch|https://github.com/jasobrown/cassandra/tree/8457]|
|[dtest|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-8457-dtest/]|
|[testall|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-8457-testall/]|
I've tried to preserve all of the functionality/behaviors of the existing
implementation as I could, and some aspects were a bit tricky in the
non-blocking IO, netty world. I've also extensively documented the code as much
as I can, and I still want to add more high-level docs on 1) the internode
protocol itself, and 2) the use of netty in internode messaging. Hopefully the
current state of documentation helps understanding and reviewing the changes.
Here's some high-level notes as points of departure/interest/discussion:
- I've left the existing {{OutboundTcpConnection}} code largely intact for the
short term (read on for more detail). But mostly the new and existing behaviors
and coexist in the code together (though not at run time)
- There is a yaml property to enable/disable using netty for internode
messaging. If disabled, we'll fall back to the existing
{{OutboundTcpConnection}} code. Part of this stems from the fact that streaming
also uses the same the socket infrastructure as internode messaging handshake
as messaging, and streaming would be broken without the
{{OutboundTcpConnection}} implementation. I am knees deep in switching
streaming over to a non-blocking, netty-based solution, but that is a separate
ticket/body of work.
- In order to support non-blocking IO, I've altered the internode messaging
protocol such that each message is framed, and the frame contains a message
size. The protocol change is what forces these changes to happen at a major rev
update, hence 4.0
- Backward compatibility - We will need to handle the case of cluster upgrade
where some nodes are on the previous version of the protocol (not upgraded),
and some are upgraded. The upgraded nodes will still need to behave and operate
correctly with the older nodes, and that functionality is encapsulated and
documented in {{LegacyClientHandler}} (for the receive side) and
{{MessageOutHandler}} for the send side.
- Message coalescing - The existing behaviors in {{CoalescingStrategies}} are
predicated on parking the thread to allow outbound messages to arrive (and be
coalesced). Parking a thread in a non-blocking/netty context is a bad thing, so
I've inverted the behavior of message coalescing a bit. Instead of blocking a
thread, I've extended the {{CoalescingStrategies.CoalescingStrategy}}
implementations to return a 'time to wait' to left messages arrive for sending.
I then schedule a task in the netty scheduler to execute that many nanoseconds
in the future, queuing up incoming message, and then send them out when the
scheduled task executes (this is {{CoalescingMessageOutHandler}}). I've also
added callback functions to {{CoalescingStrategies.CoalescingStrategy}}
implementations for the non-blocking paradigm to record updates to the strategy
(for recalculation of the time window, etc).
- Message flushing - Currently in {{OutboundTcpConnection}}, we only call flush
on the output stream if the backlog is empty (there's no more messages to send
to the peer). Unfortunately there's no equivalent API in netty to know there's
any messages in the channel waiting to be sent. The solution that I've gone
with is to have a shared counter outside of the channel
{{InternodeMessagingConnection#outboundCount}} and inside the channel
{{CoalescingMessageOutHandler#outboundCounter}}, and when
{{CoalescingMessageOutHandler}} sees the value of that counter is zero, it
knows it can explicitly call flush. I'm not entirely thrilled with this
approach, and there's some potential race/correctness problems (and
complexity!) when reconnections occur, so I'm open to suggestions on how to
achieve this functionality.
- I've included support for netty's OpenSSL library. The operator will need to
deploy an extra netty jar (http://netty.io/wiki/forked-tomcat-native.html) to
get the OpenSSL behavior (I'm not sure if we can or want to include it in our
distro). {{SSLFactory}} needed to be refactored a bit to support the OpenSSL
functionality.
I'll be doing some more extensive testing next week (including a more thorough
exploration of the backward compatibility).
> nio MessagingService
> --------------------
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Jonathan Ellis
> Priority: Minor
> Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big
> contributor to context switching, especially for larger clusters. Let's look
> at switching to nio, possibly via Netty.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)