[ https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226898#comment-17226898 ]
Sam Tunnicliffe commented on CASSANDRA-15299: --------------------------------------------- Thanks, [~ifesdjeen], the flow chart is definitely useful (and looks correct to me, aside from the missing labels that Caleb mentioned). Re: your comments... {quote}in CQLMessageHeader#processLargeMessage, we seem to be creating a LargeMessage object to call processCqlFrame() over a frame that is assembled from a single byte buffer. Maybe we can just call processCqlFrame directly?{quote} thanks, removed in {{ce6223cb}} {quote}this is probably something that we should address outside this ticket, but still: "corrupt frame recovered" seems to be slightly misleading wording, same as Frame#recoverable. We can not recover the frame itself, we just can skip/drop it. Maybe we can rename this, along with metrics in Messaging, before they become public in 4.0.{quote} I tend to agree, maybe something which conveys that the frames are skipped/dropped rather than recovered would be better? We can open a new JIRA for this. {quote}in CQLMessageHeader#processCqlFrame, we only call handleErrorAndRelease. However, it may theoretically happen that we fail before we finish messageDecoder.decode(channel, frame). Maybe we can do something like the [1], to make it consistent with what we do in ProtocolDecoder#decode?{quote} Good call, added in {{8dbf7e42}}. {quote}in CQLMessageHeader$LargeMessage#onComplete, we wrap processCqlFrame(assembleFrame() call in try/catch which is identical to try/catch block from processCqlFrame itself. Just checking if this is intended.{quote} no, it was a redundancy left over from earlier, removed it in {{8dbf7e42}}. {quote}in Dispatcher#processRequest, we can slightly simplify code by declaring FlushItem<?> flushItem variable outside try/catch block and assigning a corresponding value in try or catch, and only calling flush once.{quote} done in {{8dbf7e42}} {quote}during sever bootstrap initialization, we're using deprecated low/high watermark child options, probably we should use .childOption(ChannelOption.WRITE_BUFFER_WATER_MARK, new WriteBufferWaterMark(8 * 1024, 32 * 1024)) instead.{quote} Thanks, fixed in {{8dbf7e42}} {quote}SimpleClientBurnTest#random is unused{quote} thanks, removed. [~maedhroz], thanks, those are decent improvements and seem to me worth making. {quote}testChangingLimitsAtRuntime has a bunch of places where it checks the same value twice, for instance from ClientResourceLimits.getGlobalLimit() and then DatabaseDescriptor.getNativeTransportMaxConcurrentRequestsInBytes(). Seems like we could do away with the second?{quote} Thanks, it is redundant for the global limit, as the {{ClientResourceLimits}} method just delegates to {{DatabaseDescriptor}}, which was always the case I think. For the per-endpoint limits, it's definitely worth checking both places to ensure existing and new limits observe the same cap. > CASSANDRA-13304 follow-up: improve checksumming and compression in protocol > v5-beta > ----------------------------------------------------------------------------------- > > Key: CASSANDRA-15299 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15299 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Client > Reporter: Aleksey Yeschenko > Assignee: Sam Tunnicliffe > Priority: Normal > Labels: protocolv5 > Fix For: 4.0-alpha > > Attachments: Process CQL Frame.png, V5 Flow Chart.png > > > CASSANDRA-13304 made an important improvement to our native protocol: it > introduced checksumming/CRC32 to request and response bodies. It’s an > important step forward, but it doesn’t cover the entire stream. In > particular, the message header is not covered by a checksum or a crc, which > poses a correctness issue if, for example, {{streamId}} gets corrupted. > Additionally, we aren’t quite using CRC32 correctly, in two ways: > 1. We are calculating the CRC32 of the *decompressed* value instead of > computing the CRC32 on the bytes written on the wire - losing the properties > of the CRC32. In some cases, due to this sequencing, attempting to decompress > a corrupt stream can cause a segfault by LZ4. > 2. When using CRC32, the CRC32 value is written in the incorrect byte order, > also losing some of the protections. > See https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf for > explanation for the two points above. > Separately, there are some long-standing issues with the protocol - since > *way* before CASSANDRA-13304. Importantly, both checksumming and compression > operate on individual message bodies rather than frames of multiple complete > messages. In reality, this has several important additional downsides. To > name a couple: > # For compression, we are getting poor compression ratios for smaller > messages - when operating on tiny sequences of bytes. In reality, for most > small requests and responses we are discarding the compressed value as it’d > be smaller than the uncompressed one - incurring both redundant allocations > and compressions. > # For checksumming and CRC32 we pay a high overhead price for small messages. > 4 bytes extra is *a lot* for an empty write response, for example. > To address the correctness issue of {{streamId}} not being covered by the > checksum/CRC32 and the inefficiency in compression and checksumming/CRC32, we > should switch to a framing protocol with multiple messages in a single frame. > I suggest we reuse the framing protocol recently implemented for internode > messaging in CASSANDRA-15066 to the extent that its logic can be borrowed, > and that we do it before native protocol v5 graduates from beta. See > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderCrc.java > and > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderLZ4.java. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org