[ 
https://issues.apache.org/jira/browse/CASSANDRA-13630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587988#comment-16587988
 ] 

Jason Brown commented on CASSANDRA-13630:
-----------------------------------------

Picking this back up after a year, I realize that my previous solution only 
solved part of the problem. I solved the "don't allocate an enormous buffer" 
problem, but I was still allocating an "enourmous buffer"'s worth of memory at 
the same time, albeit across multiple buffers. I believe this is ultimately 
what [~aweisberg]'s concerns with the previous solution encompassed, and I 
fully agree. Further, the previous patch attempted to do more than just solve 
the large buffer problem, it optimized allocating small buffers. With this new 
insight, that optimization is best left to a separate ticket.

Thus, the new solution focuses only on the large buffer problem. The high-level 
overview of this patch is:
 - use the existing {{ByteBufDataOutputStreamPlus}} to chunk up the large 
message into small buffers, and use {{ByteBufDataOutputStreamPlus}} 's existing 
rateLimiting mechanism to make sure we don't keep too much outstanding data in 
the channel
 - rework the inbound side to allow a blocking-style message deserialization.
 - Refactoring to make serilization/deserialization code reusable as well as 
some clean up.

In order to support both serialization and deserialization of arbitrarily large 
messages and our blocking-style {{IVersionedSerializers}}, I need to perform 
the those activities on a separate (background) thread. On the outbound side 
this is achieved with a new {{ChannelWriter}} subclass. On the inbound side, 
there is a fair bit of refactoring, but the thread for deserialization is in 
{{MessageInHandler.BlockingBufferHandler}}. Both of these "background threads" 
are implemented as {{ThreadExecutorServices}} so that if no large messages are 
being sent or received, the thread can be shutdown (and save the system 
resources).

On the outbound side, it is easy to know if a specific 
{{OutboundMessagingConnection}} will be sending large messages, as we can look 
at it's {{OutboundConnectionIdentifier}}. The inbound side does not have that 
luxury, and my previous patch attempted to do some overly clever things. The 
simpler solution is to add a flag to the internode messaging protocol that 
advises the receiving side that the connection will be used for large messages, 
and the receiver can setup appropriately. We already have a flags section in 
the internode messaging protocol's header, and many unused bits within that. 
Further, peers that are unaware of the new bit flag (i.e. any cassandra version 
less than 4.0) will completely ignore the flag as they do not attempt to 
interpret those bits. Thus, this change is rather safe, from a 
protocol/handshake perspective. In fact, I'd like to backport this protocol 
change to 3.0 and 3.11 to have the flag sent out on new connections. The flag 
will be completely ignored on those versions, except when, during a cluster 
upgrade, the 3.0 node connects to a 4.0 node, the 4.0 node will know that the 
connection will contain large messages and can setup the receive side 
appropriately. In no way would operators be required to minor upgrade to those 
versions of 3.X which contain the upgraded messaging version flag (before 
upgrading to 4.0), but it would help make the upgrade to 4.0 smoother from a 
performance/memory management perspective.

The other major aspect of this ticket was a refactoring mostly to move the 
serialization/deserialization out of {{MessageOutHandler}}/{{MessageInHandler}} 
so that logic could be invoked outside of the context of a netty handler. This 
also allowed me to clean up {{MessageIn}} and {{MessageOut}}, as well. Note: 
I've eliminated {{BaseMessageInHandler}} and moved the version-specific 
messaging parsing into classes derived from the new 
{{MessageIn.MessageInProcessor}}. {{MessageInHandler}} now determines if it 
needs to do non-blocking or blocking deserialization, and handles the buffers 
appropriately. {{MessageInHandler}} now derives from 
{{ChannelInboundHandlerAdapter}}, so the error handling changed slightly. The 
refacorings also affected where the unit tests are layed out (corresponding to 
where the logic/code unit test now lives), so I moved things around in there, 
as well.
||13630||
|[branch|https://github.com/jasobrown/cassandra/tree/13630]|
|[utests & 
dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/13630]|

I also needed to make a trivial [change to one 
dtest|https://github.com/jasobrown/cassandra-dtest/tree/13630].

To make the review easier, I've [opened a PR 
here|https://github.com/apache/cassandra/pull/253]

> support large internode messages with netty
> -------------------------------------------
>
>                 Key: CASSANDRA-13630
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13630
>             Project: Cassandra
>          Issue Type: Task
>          Components: Streaming and Messaging
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>            Priority: Major
>             Fix For: 4.0
>
>
> As part of CASSANDRA-8457, we decided to punt on large mesages to reduce the 
> scope of that ticket. However, we still need that functionality to ship a 
> correctly operating internode messaging subsystem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to