[
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15523138#comment-15523138
]
Sylvain Lebresne commented on CASSANDRA-8457:
---------------------------------------------
Here comes more review bits, though it's still incomplete as I have to context
switch a bit. I'll start by a number of general remarks, followed by some more
specific comments on some classes, and a few random nits to finish.
The first thing that I think we should discuss is how we want to handle the
transition to this (that is, what happens to the old implementation). Namely, I
see 2 possible options:
# we simply switch to Netty, period, and the old implementation goes away;
# we keep both implementations in, with a configuration flag to pick which one
is used as implemented in the patch, and this probably until 5.0.
I think the main advantage of keeping both implementation for a while is that
it's somewhat safer and easier for user to evaluate outside of the 4.0 upgrade
itself. But the downsides are obviously that it's a lot more code to maintain,
it's more messy right off the bat, and given only the new implementation will
stay eventually, it's only pushing the switch to later. Personally, since we'll
ship this in a new major (4.0) and since the cost of maintaining 2
implementations is pretty unpleasant and we don't have infinite devs resources,
I'd have a slightly preference for option 1, but I'm not going to argue
strongly either way. Mostly, I want us to decide, because I think this
influence the final product here.
The second thing I want to mention is streaming. I could be missing something
here but it appears streaming is just not supported at all by the new code,
meaning that the patch as is is unusable outside of toy testing (I mean by that
that if you start a node with the Netty MessagingService, it seems you can't
received any streaming, so you're not a fully functioning node). So I'm kind of
wondering what is the plan here. I'm happy to thinker on another ticket as
Streaming is a best on its own, but I'm not sure I'm ok committing anything
until it's a fully working product.
So anyway, with those talking point out of the way, a few more "technical"
general remarks on the patch:
* Netty's openssl: what's the fundamental difference between the new
{{keyfile}} and the old {{keystore}}? It sounds confusing to have to configure
new things when there is already pre-existing parameters for SSL.
* Some "old" code is made public (many constants in particular) and reused by
the "new" code, while other bits are copy-pasted, but it's not very consistent
and makes things sometimes a bit hard to follow ({{QueuedMessage}} (used by new
code too) being an internal class of {{OutboundTcpConnection}} (old code) is a
particularly bad offender imo). That's typically where I want us to make a
decision on the point I raise above, as whatever decision we take I feel there
is some cleanups to be done.
* It would be really nice to have a package javadoc for the new "async" package
that explains the main classes involved and how they interact.
* There is a few TODO that needs to be handled.
* I feel some of the naming could be made more consistent. In particular, the
patch uses Client/Server for some classes (e.g.
{{ClientHandshakeHandler}}/{{ClientConnector}}) but In/Out for others (e.g.
{{MessageInHandler}}). And things like {{InternodeMessagingPool}} is imo less
good than the old {{OutboundTcpConnectionPool}} in that it doesn't indicates
it's only for outbound connections. I'd prefer avoid Client/Server for
internode stuffs and reserve it for native protocol classes, sticking to In/Out
(or Inbound/Outbound when that reads better) for internode classes, like we did
before. I understand this may have been to avoid name clash with existing
classes, but maybe that's where we should also decide how long the old
implementation to stay in.
Then a few more specific comments:
* In {{MessagingService}}:
** We should probably move everything related to {{ServerSocket}} inside the
new {{BlockingIOSender}} (though again, if we decide to just ditch the old
implementation, it simply goes away).
** Related to the previous point, the {{getSocketThreads()}} is used by
{{StreamingTransferTest}} to apparently wait for the streaming connections to
close. Returning an empty list seems to kind of break that. I think we could
replace the method by some {{awaitAllStreamingConnectionsClosed()}} (that would
still just be for testing). Of course, that's kind of streaming related.
** I would probably rename {{switchIpAddress()}} to {{reconnectWithNewIp()}} as
it makes it more explicit what does happen (or at least have some javadoc).
Similarly, {{getCurrentEndpoint()}} is imo no very clearly named (not this
patch fault though), maybe {{getPreferredAddress()}}?
** In {{createConnection()}}, it appears the old implementation was starting
the connections and waiting on them to be properly started before returning,
but the new implementation doesn't. Not sure how important that is in practice,
but it feels unintuitive that a {{createConnection()}} method wouldn't actually
do anything.
** Seems {{NettySender.listen()}} always starts a non-secure connection, even
if we have {{ServerEncryptionOptions.InternodeEncryption.all}}, which sounds
wrong.
* In {{ClientConnect}}, in {{connectComplete}}, checking that the future is
done is somewhat non-sensical as it shouldn't happen by definition of the
method. Maybe an {{assert}} instead? Also, the method should be
{{@VisibleForTesting}}.
* In {{ClientHandshakeHandler}}:
** There seem to be a typo in javadoc of {{remoteAddr}} (at least I'm having a
hard time parsing it).
** Could abstract the message size (in the call to {{ioBuffer}}) into a
constant, if only for symmetry with {{SECOND_MESSAGE_LENGTH}}.
** {{handshakeTimeout}} feels misnamed (with the current name, the call in
{{exceptionCaught}} is confusing: "why are we timeouting on an exception?!").
It's really failing the handshake, so I'd call it {{failHandshake}} or
{{abortHanshake}}. Somewhat related, the field {{handshakeResponse}} is really
a future on the timeout, so I'd call it {{timeoutFuture}}.
** In {{decode}}, there is 2 different cases of version mismatch: one where we
use to do a hard disconnect, and one where we did a "soft" one. The new
implementation still pretend to do something different in the message it logs,
but doesn't appear to make a difference in practice, which is confusing.
** It's a tad unexpected to have {{ClientHandshakeResult}} as a subclass of
{{InternodeMessagingConnection}} rather than of this class
({{ClientHandshakeHandler}}), and the fact some final parts of the handshake
are actually happening in {{InternodeMessagingConnection}} make things a tad
harder to follow.
** If the handshake timeout, it doesn't seem we log anything special. Might be
worth at least having some debug message?
* In {{CoalescingMessageOutHandler}}:
** It would feel simpler (and possibly more efficient) to not add the handler
to the pipeline at all if the strategy isn't doing coalescing (and assert it
does in the handler).
** In {{write}}, we can avid adding the listener if {{coalesceCallback ==
null}} rather than checking inside the listener.
And that's mostly it for now, but more comments on other classes should come at
a later time. Just a few small random
nits for the road:
* In {{IncomingTcpConnection}}, a new {{from}} parameter is added to
{{receiveMessage}} but it seems unused.
* You editor seems to expand {{.*}} imports, which adds unecessary noise to the
patch/we generally don't do.
* {{MessageOutHandler}} creates it's {{Logger}} using
{{CoalescingMessageOutHandler}} rather than himself (copy-paste typo I assume).
> nio MessagingService
> --------------------
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Jonathan Ellis
> Assignee: Jason Brown
> Priority: Minor
> Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big
> contributor to context switching, especially for larger clusters. Let's look
> at switching to nio, possibly via Netty.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)