Michael Kjellman created CASSANDRA-13304:
--------------------------------------------

             Summary: Add checksumming to the native protocol
                 Key: CASSANDRA-13304
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13304
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Michael Kjellman
            Assignee: Michael Kjellman


The native binary transport implementation doesn't include checksums. This 
makes it highly susceptible to silently inserting corrupted data either due to 
hardware issues causing bit flips on the sender/client side, C*/receiver side, 
or network in between.

Attaching an implementation that makes checksum'ing mandatory (assuming both 
client and server know about a protocol version that supports checksums) -- and 
also adds checksumming to clients that request compression.

The serialized format looks something like this:
{noformat}
 *                      1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
 *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * |  Number of Compressed Chunks  |     Compressed Length (e1)    /
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * /  Compressed Length cont. (e1) |    Uncompressed Length (e1)   /
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * | Uncompressed Length cont. (e1)| CRC32 Checksum of Lengths (e1)|
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * | Checksum of Lengths cont. (e1)|    Compressed Bytes (e1)    +//
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * |                      CRC32 Checksum (e1)                     ||
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * |                    Compressed Length (e2)                     |
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * |                   Uncompressed Length (e2)                    |
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * |                CRC32 Checksum of Lengths (e2)                 |
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * |                     Compressed Bytes (e2)                   +//
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * |                      CRC32 Checksum (e2)                     ||
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * |                    Compressed Length (en)                     |
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * |                   Uncompressed Length (en)                    |
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * |                CRC32 Checksum of Lengths (en)                 |
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * |                      Compressed Bytes (en)                  +//
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * |                      CRC32 Checksum (en)                     ||
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
{noformat}

The first pass here adds checksums only to the actual contents of the frame 
body itself (and doesn't actually checksum lengths and headers). While it would 
be great to fully add checksuming across the entire protocol, the proposed 
implementation will ensure we at least catch corrupted data and likely protect 
ourselves pretty well anyways.

I didn't go to the trouble of implementing a Snappy Checksum'ed Compressor 
implementation as it's been deprecated for a while -- is really slow and crappy 
compared to LZ4 -- and we should do everything in our power to make sure no one 
in the community is still using it. I left it in (for obvious backwards 
compatibility aspects) old for clients that don't know about the new protocol.

The current protocol has a 256MB (max) frame body -- where the serialized 
contents are simply written in to the frame body.

If the client sends a compression option in the startup, we will install a 
FrameCompressor inline. Unfortunately, we went with a decision to treat the 
frame body separately from the header bits etc in a given message. So, instead 
we put a compressor implementation in the options and then if it's not null, we 
push the serialized bytes for the frame body *only* thru the given 
FrameCompressor implementation. The existing implementations simply provide all 
the bytes for the frame body in one go to the compressor implementation and 
then serialize it with the length of the compressed bytes up front.

Unfortunately, this won't work for checksum'ing for obvious reasons as we can't 
naively just checksum the entire (potentially) 256MB frame body and slap it at 
the end... so,

The best place to start with the changes is in {{ChecksumedCompressor}}. I 
implemented one single place to perform the checksuming (and to support 
checksuming) the actual required chunking logic. Implementations of 
ChecksumedCompressor only implement the actual calls to the given compression 
algorithm for the provided bytes.

Although the interface takes a {{Checksum}}, right now the attached patch uses 
CRC32 everywhere. As of right now, given JDK8+ has support for doing the 
calculation with the Intel instruction set, CRC32 is about as fast as we can 
get right now.

I went with a 32kb "default" for the chunk size -- meaning we will chunk the 
entire frame body into 32kb chunks, compress each one of those chunks, and 
checksum the chunk. Upon discussing with a bunch of people and researching how 
checksums actually work and how much data they will protect etc -- if we use 
32kb chunks with CRC32 we can catch up to 32 bits flipped in a row (but more 
importantly catch the more likely corruption where a single bit is flipped) 
with pretty high certainty. 64kb seems to introduce too much of a probability 
of missing corruption.

The maximum block size LZ4 operates on is a 64kb chunk -- so this combined with 
the need to make sure the CRC32 checksums are actually going to catch stuff -- 
chunking at 32kb seemed like a good reasonable value to use when weighing both 
checksums and compression (to ensure we don't kill our compression ratio etc).

I'm not including client changes here -- I asked around and I'm not really sure 
what the policy there is -- do we update the python driver? java driver? how 
has the timing of this stuff been handled in the past?




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to