[jira] [Commented] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta

Olivier Michallat (Jira) Mon, 04 May 2020 17:54:09 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099447#comment-17099447
 ]


Olivier Michallat commented on CASSANDRA-15299:
-----------------------------------------------

I work on the Java driver. Here are a few comments/questions on your branch 
(let me know if there is a better medium for this).

1) Name clash:
{quote}The introduction of this new format, brings an unfortunate naming clash 
between frames from the previous format [...] and the new outer frames. What 
was previously referred to as a Frame could perhaps be better described as an 
Envelope or Container [...] Prior to making any conclusion here though, for the 
purposes of this document, legacy frames are referred to as CQL frames and the 
new enclosures as outer frames
{quote}
You're hinting at renaming at a later point. I think that will be most welcome, 
the names as they are now are pretty confusing. And IMHO you should rename the 
outer container: even though "frame" is better suited for it, it's more 
important to preserve familiarity for people coming from the legacy code.

2) CRC check failures: this will result in either dropping the corrupt frame, 
or the whole connection (comments in {{InboundMessageHandler}}).

>From a client point of view, a dropped frame will result in request timeouts. 
>We have no way of providing a better error, since the stream ids of the failed 
>requests are in the corrupt payload. I'm wondering if it might not be better 
>to drop the connection all the time: at least the client gets immediate 
>feedback (we could try to propagate a cause), instead of a bunch of requests 
>timing out for no apparent reason.

3) When two large messages are sent in a row, how will you tell the last outer 
frame of the first message from the first outer frame of the second? Don't you 
need some sort of {{is_last_frame}} flag in the header?

4) When compression is enabled, how are you going to test when the payload 
reaches the max size? This would have to be added to the logic in 
{{V5Flusher}}; but all the messages in the payload are compressed together, so 
you can't check the size as you go like you do for the uncompressed version. I 
was thinking of maybe compressing all available results and splitting in 
multiple uncontained frames if the compressed payload is too big, but that 
seems at odds with this invariant in {{InboundMessageHandler}}:
{quote}All the bytes in an uncontained frame always belong to a single 
message.{quote}

> CASSANDRA-13304 follow-up: improve checksumming and compression in protocol 
> v5-beta
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15299
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15299
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Messaging/Client
>            Reporter: Aleksey Yeschenko
>            Assignee: Sam Tunnicliffe
>            Priority: Normal
>              Labels: protocolv5
>             Fix For: 4.0-beta
>
>
> CASSANDRA-13304 made an important improvement to our native protocol: it 
> introduced checksumming/CRC32 to request and response bodies. It’s an 
> important step forward, but it doesn’t cover the entire stream. In 
> particular, the message header is not covered by a checksum or a crc, which 
> poses a correctness issue if, for example, {{streamId}} gets corrupted.
> Additionally, we aren’t quite using CRC32 correctly, in two ways:
> 1. We are calculating the CRC32 of the *decompressed* value instead of 
> computing the CRC32 on the bytes written on the wire - losing the properties 
> of the CRC32. In some cases, due to this sequencing, attempting to decompress 
> a corrupt stream can cause a segfault by LZ4.
> 2. When using CRC32, the CRC32 value is written in the incorrect byte order, 
> also losing some of the protections.
> See https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf for 
> explanation for the two points above.
> Separately, there are some long-standing issues with the protocol - since 
> *way* before CASSANDRA-13304. Importantly, both checksumming and compression 
> operate on individual message bodies rather than frames of multiple complete 
> messages. In reality, this has several important additional downsides. To 
> name a couple:
> # For compression, we are getting poor compression ratios for smaller 
> messages - when operating on tiny sequences of bytes. In reality, for most 
> small requests and responses we are discarding the compressed value as it’d 
> be smaller than the uncompressed one - incurring both redundant allocations 
> and compressions.
> # For checksumming and CRC32 we pay a high overhead price for small messages. 
> 4 bytes extra is *a lot* for an empty write response, for example.
> To address the correctness issue of {{streamId}} not being covered by the 
> checksum/CRC32 and the inefficiency in compression and checksumming/CRC32, we 
> should switch to a framing protocol with multiple messages in a single frame.
> I suggest we reuse the framing protocol recently implemented for internode 
> messaging in CASSANDRA-15066 to the extent that its logic can be borrowed, 
> and that we do it before native protocol v5 graduates from beta. See 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderCrc.java
>  and 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderLZ4.java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta

Reply via email to