[jira] [Commented] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta

Sam Tunnicliffe (Jira) Thu, 21 May 2020 02:18:09 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112995#comment-17112995
 ]


Sam Tunnicliffe commented on CASSANDRA-15299:
---------------------------------------------

When we spoke in on slack, and I'm afraid the answers I gave were only 
partially correct. 
https://the-asf.slack.com/archives/CK23JSY2K/p1589982060404400

The good news is that your approach is right and it will correctly decode the 
header, but the diagram is also correct. The disconnect is that the diagram is 
describing the bytes as they are laid out on the wire at the data layer, 
whereas in C* and the Java Driver we're working at a point somewhat higher up 
the stack. That's perhaps not as clear as it could be and I'll definitely work 
on improving the docs in that regard.

To deconstruct your example, you have the bytes in the buffer with a binary 
representation of
{{00000100 00000000 00000010 00000001 11111001 11110010}}

The first 3 bytes {{00000100 00000000 00000010}} are the header length + flag.
{{4 |= (1 << 17) == 131076 == 00000010 00000000 00000100}}
Writing them to the buffer in little endian ordering (in 
{{FrameEncodedCrc::put3b}}):
{{00000100 00000000 00000010}}
and reading them off the wire, where ethernet transmission specifies bytes are 
sent least significant bit first:
{{01000000 00000000 00100000}}
so the 18th bit read off the wire is the {{1}} representing the 
{{selfContained}} flag, but to extract it requires the steps you outlined 
including the endianness switch.

The second 3 bytes in your buffer {{00000001 11111001 11110010}} is the crc24 
in little endian order. Here though, the bit order is less important as up the 
stack we're dealing in whole bytes.
So on the wire we'd see {{10000000 10011111 01001111}}, and read that as 
{{00000001 11111001 11110010}}. 
Then in {{FrameDecoderCrc::readHeader6b}} we change the endianness to give
{{11110010 11111001 00000001 == 15923457 == crc24(131076)}}


> CASSANDRA-13304 follow-up: improve checksumming and compression in protocol 
> v5-beta
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15299
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15299
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Messaging/Client
>            Reporter: Aleksey Yeschenko
>            Assignee: Sam Tunnicliffe
>            Priority: Normal
>              Labels: protocolv5
>             Fix For: 4.0-alpha
>
>
> CASSANDRA-13304 made an important improvement to our native protocol: it 
> introduced checksumming/CRC32 to request and response bodies. It’s an 
> important step forward, but it doesn’t cover the entire stream. In 
> particular, the message header is not covered by a checksum or a crc, which 
> poses a correctness issue if, for example, {{streamId}} gets corrupted.
> Additionally, we aren’t quite using CRC32 correctly, in two ways:
> 1. We are calculating the CRC32 of the *decompressed* value instead of 
> computing the CRC32 on the bytes written on the wire - losing the properties 
> of the CRC32. In some cases, due to this sequencing, attempting to decompress 
> a corrupt stream can cause a segfault by LZ4.
> 2. When using CRC32, the CRC32 value is written in the incorrect byte order, 
> also losing some of the protections.
> See https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf for 
> explanation for the two points above.
> Separately, there are some long-standing issues with the protocol - since 
> *way* before CASSANDRA-13304. Importantly, both checksumming and compression 
> operate on individual message bodies rather than frames of multiple complete 
> messages. In reality, this has several important additional downsides. To 
> name a couple:
> # For compression, we are getting poor compression ratios for smaller 
> messages - when operating on tiny sequences of bytes. In reality, for most 
> small requests and responses we are discarding the compressed value as it’d 
> be smaller than the uncompressed one - incurring both redundant allocations 
> and compressions.
> # For checksumming and CRC32 we pay a high overhead price for small messages. 
> 4 bytes extra is *a lot* for an empty write response, for example.
> To address the correctness issue of {{streamId}} not being covered by the 
> checksum/CRC32 and the inefficiency in compression and checksumming/CRC32, we 
> should switch to a framing protocol with multiple messages in a single frame.
> I suggest we reuse the framing protocol recently implemented for internode 
> messaging in CASSANDRA-15066 to the extent that its logic can be borrowed, 
> and that we do it before native protocol v5 graduates from beta. See 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderCrc.java
>  and 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderLZ4.java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta

Reply via email to