[
https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801277#comment-16801277
]
Benedict commented on CASSANDRA-15066:
--------------------------------------
h2. Framing
This patch introduces framing to all internode messages, i.e. the grouping of
messages into a single logical payload with headers and trailers; these frames
are guaranteed to either contain at most one message, that is split into its
own unique sequence of frames (for large messages), or that a frame contains
only complete messages.
h3. Correctness
Previously, intra-dc internode messages would be unprotected from corruption by
default, as only LZ4 provided any integrity checks.
All messages to post40 nodes are now written to explicit frames, which may be:
* LZ4 encoded
* CRC protected
* Unprotected (for those that don't care, have a trusted transport layer they
trust, or for comparisons to pre 4.0)
This oversight probably would have limited the upside to the client-side
checksumming patch recently introduced, so this patch ensures that - by default
- all messages are covered by a CRC (though it may be that in future we want to
use a CRC64, or limit the frame size, to ensure strong protection)
h3. Resilience
All frames are written with a separate CRC protected header, of 8 and 6 bytes
respectively.
If corruption occurs in this header, the connection must be reset, as before.
If corruption occurs anywhere outside of the header, the corrupt frame will be
skipped, leaving the connection intact and avoiding the loss of any messages
unnecessarily
Previously, any issue at any point in the stream would result in the connection
being reset, with the loss of any in-flight messages in the socket’s receive
buffer, in the Netty application buffer, or on the sending node’s socket send
buffer, its Netty flush queue, or in flight on the wire.
h3. Efficiency
We reduce the overall memory footprint, and number of byte shuffles, on both
inbound and outbound.
*Outbound* the Netty LZ4 encoder maintains a chunk size buffer (64KiB), that is
filled before any compressed frame can be produced. Our frame encoders avoid
this redundant copy, as well as freeing 192KiB per endpoint.
*Inbound* ByteToMessageDecoder has a number of inefficiencies:
* Partially parsed messages retain the bytes of all prior messages that arrived
together
* Bytes completing a prior partial message are copied to the end of these
earlier bytes alongside any following message bytes that arrived off the
network together
Our frame decoders guarantee only to copy the number of bytes necessary to
parse a frame, and to never store more bytes than necessary. This improvement
applies twice to LZ4 connections, improving both the message decode and the LZ4
frame decode.
> Improvements to Internode Messaging
> -----------------------------------
>
> Key: CASSANDRA-15066
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15066
> Project: Cassandra
> Issue Type: Improvement
> Components: Messaging/Internode
> Reporter: Benedict
> Assignee: Benedict
> Priority: Normal
> Fix For: 4.0
>
>
> CASSANDRA-8457 introduced asynchronous networking to internode messaging, but
> there have been several follow-up endeavours to improve some semantic issues.
> CASSANDRA-14503 and CASSANDRA-13630 are the latest such efforts, and were
> combined some months ago into a single overarching refactor of the original
> work, to address some of the issues that have been discovered. Given the
> criticality of this work to the project, we wanted to bring some more eyes to
> bear to ensure the release goes ahead smoothly. In doing so, we uncovered a
> number of issues with messaging, some of which long standing, that we felt
> needed to be addressed. This patch widens the scope of CASSANDRA-14503 and
> CASSANDRA-13630 in an effort to close the book on the messaging service, at
> least for the foreseeable future.
> The patch includes a number of clarifying refactors that touch outside of the
> {{net.async}} package, and a number of semantic changes to the {{net.async}}
> packages itself. We believe it clarifies the intent and behaviour of the
> code while improving system stability, which we will outline in comments
> below.
> https://github.com/belliottsmith/cassandra/tree/messaging-improvements
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]