[
https://issues.apache.org/jira/browse/CASSANDRA-13304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923191#comment-15923191
]
Michael Kjellman commented on CASSANDRA-13304:
----------------------------------------------
[~beobal] just finished going thru your changes.
* Although there was only the NoOp and LZ4 implementations, and that looked
kinda silly, I did that because I wanted to make it easy if we wanted to switch
in an LZFSE implementation or something. Do we really want to throw everything
into a single class and then have if conditions all over the place?
* Re: CRC32()... Cassandra itself has gone CRC32 -> Adler32 -> CRC32 already
so I didn't want to push us into sticking with CRC32 forever just because it's
currently the fastest in JDK8. I know there is a bunch of work going on Adler32
acceleration already in JDK9...
* Re: Protocol version: That's great. It was just not clear what features were
in "beta" etc and if we really could bump it. I think it's still worth keeping
the summary of things added to the version in the code -- I know it's not there
now but this has always been very helpful in places like we've always done in
{{org.apache.cassandra.io.sstable.format.big.BigFormat.BigVersion}}.
Additionally -- why would V5 be marked as "beta"? I don't really understand
what would be defined as a beta feature then. I think this should be an
official change that goes into 4.0 not something to "try" out...
* Although I understand switching from decompress/compress ->
transformInbound/transformOutbound to make it a bit clearer in the case where
we aren't actually doing compression (and just using it for checksum'ing) but I
feel we've now made it just as unclear for the other case that the code will do
decompression/compression... no?
> Add checksumming to the native protocol
> ---------------------------------------
>
> Key: CASSANDRA-13304
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13304
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Michael Kjellman
> Assignee: Michael Kjellman
> Labels: client-impacting
> Attachments: 13304_v1.diff
>
>
> The native binary transport implementation doesn't include checksums. This
> makes it highly susceptible to silently inserting corrupted data either due
> to hardware issues causing bit flips on the sender/client side, C*/receiver
> side, or network in between.
> Attaching an implementation that makes checksum'ing mandatory (assuming both
> client and server know about a protocol version that supports checksums) --
> and also adds checksumming to clients that request compression.
> The serialized format looks something like this:
> {noformat}
> * 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
> * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
> * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> * | Number of Compressed Chunks | Compressed Length (e1) /
> * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> * / Compressed Length cont. (e1) | Uncompressed Length (e1) /
> * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> * | Uncompressed Length cont. (e1)| CRC32 Checksum of Lengths (e1)|
> * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> * | Checksum of Lengths cont. (e1)| Compressed Bytes (e1) +//
> * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> * | CRC32 Checksum (e1) ||
> * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> * | Compressed Length (e2) |
> * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> * | Uncompressed Length (e2) |
> * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> * | CRC32 Checksum of Lengths (e2) |
> * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> * | Compressed Bytes (e2) +//
> * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> * | CRC32 Checksum (e2) ||
> * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> * | Compressed Length (en) |
> * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> * | Uncompressed Length (en) |
> * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> * | CRC32 Checksum of Lengths (en) |
> * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> * | Compressed Bytes (en) +//
> * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> * | CRC32 Checksum (en) ||
> * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> {noformat}
> The first pass here adds checksums only to the actual contents of the frame
> body itself (and doesn't actually checksum lengths and headers). While it
> would be great to fully add checksuming across the entire protocol, the
> proposed implementation will ensure we at least catch corrupted data and
> likely protect ourselves pretty well anyways.
> I didn't go to the trouble of implementing a Snappy Checksum'ed Compressor
> implementation as it's been deprecated for a while -- is really slow and
> crappy compared to LZ4 -- and we should do everything in our power to make
> sure no one in the community is still using it. I left it in (for obvious
> backwards compatibility aspects) old for clients that don't know about the
> new protocol.
> The current protocol has a 256MB (max) frame body -- where the serialized
> contents are simply written in to the frame body.
> If the client sends a compression option in the startup, we will install a
> FrameCompressor inline. Unfortunately, we went with a decision to treat the
> frame body separately from the header bits etc in a given message. So,
> instead we put a compressor implementation in the options and then if it's
> not null, we push the serialized bytes for the frame body *only* thru the
> given FrameCompressor implementation. The existing implementations simply
> provide all the bytes for the frame body in one go to the compressor
> implementation and then serialize it with the length of the compressed bytes
> up front.
> Unfortunately, this won't work for checksum'ing for obvious reasons as we
> can't naively just checksum the entire (potentially) 256MB frame body and
> slap it at the end... so,
> The best place to start with the changes is in {{ChecksumedCompressor}}. I
> implemented one single place to perform the checksuming (and to support
> checksuming) the actual required chunking logic. Implementations of
> ChecksumedCompressor only implement the actual calls to the given compression
> algorithm for the provided bytes.
> Although the interface takes a {{Checksum}}, right now the attached patch
> uses CRC32 everywhere. As of right now, given JDK8+ has support for doing the
> calculation with the Intel instruction set, CRC32 is about as fast as we can
> get right now.
> I went with a 32kb "default" for the chunk size -- meaning we will chunk the
> entire frame body into 32kb chunks, compress each one of those chunks, and
> checksum the chunk. Upon discussing with a bunch of people and researching
> how checksums actually work and how much data they will protect etc -- if we
> use 32kb chunks with CRC32 we can catch up to 32 bits flipped in a row (but
> more importantly catch the more likely corruption where a single bit is
> flipped) with pretty high certainty. 64kb seems to introduce too much of a
> probability of missing corruption.
> The maximum block size LZ4 operates on is a 64kb chunk -- so this combined
> with the need to make sure the CRC32 checksums are actually going to catch
> stuff -- chunking at 32kb seemed like a good reasonable value to use when
> weighing both checksums and compression (to ensure we don't kill our
> compression ratio etc).
> I'm not including client changes here -- I asked around and I'm not really
> sure what the policy there is -- do we update the python driver? java driver?
> how has the timing of this stuff been handled in the past?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)