[
https://issues.apache.org/jira/browse/CASSANDRA-14568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613724#comment-16613724
]
Benedict commented on CASSANDRA-14568:
--------------------------------------
[3.0|https://github.com/belliottsmith/cassandra/tree/14568]
[CI|https://circleci.com/workflow-run/70397395-3585-4b4c-904f-a55c070cf359]
Thanks both for your review. I have split out CASSANDRA-14749, and pushed an
updated patch simply missing this part. If either of you could give a quick
cursory +1, I'll commit them both.
> Static collection deletions are corrupted in 3.0 -> 2.{1,2} messages
> --------------------------------------------------------------------
>
> Key: CASSANDRA-14568
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14568
> Project: Cassandra
> Issue Type: Bug
> Reporter: Benedict
> Assignee: Benedict
> Priority: Critical
> Fix For: 3.0.17, 3.11.3
>
>
> In 2.1 and 2.2, row and complex deletions were represented as range
> tombstones. LegacyLayout is our compatibility layer, that translates the
> relevant RT patterns in 2.1/2.2 to row/complex deletions in 3.0, and vice
> versa. Unfortunately, it does not handle the special case of static row
> deletions, they are treated as regular row deletions. Since static rows are
> themselves never directly deleted, the only issue is with collection
> deletions.
> Collection deletions in 2.1/2.2 were encoded as a range tombstone, consisting
> of a sequence of the clustering keys’ data for the affected row, followed by
> the bytes representing the name of the collection column. STATIC_CLUSTERING
> contains zero clusterings, so by treating the deletion as for a regular row,
> zero clusterings are written to precede the column name of the erased
> collection, so the column name is written at position zero.
> This can exhibit itself in at least two ways:
> # If the type of your first clustering key is a variable width type, new
> deletes will begin appearing covering the clustering key represented by the
> column name.
> ** If you have multiple clustering keys, you will receive a RT covering all
> those rows with a matching first clustering key.
> ** This RT will be valid as far as the system is concerned, and go
> undetected unless there are outside data quality checks in place.
> # Otherwise, an invalid size of data will be written to the clustering and
> sent over the network to the 2.1 node.
> ** The 2.1/2.2 node will handle this just fine, even though the record is
> junk. Since it is a deletion covering impossible data, there will be no
> user-API visible effect. But if received as a write from a 3.0 node, it will
> dutifully persist the junk record.
> ** The 3.0 node that originally sent this junk, may later coordinate a read
> of the partition, and will notice a digest mismatch, read-repair and
> serialize the junk to disk
> ** The sstable containing this record is now corrupt; the deserialization
> expects fixed-width data, but it encounters too many (or too few) bytes, and
> is now at an incorrect position to read its structural information
> ** (Alternatively when the 2.1 node is upgraded this will occur on eventual
> compaction)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]