[
https://issues.apache.org/jira/browse/CASSANDRA-14568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582245#comment-16582245
]
Benedict commented on CASSANDRA-14568:
--------------------------------------
bq. My concern was that LRT.start.bound is directly referenced in
UnfilteredDeserializer when converting to a RTMarker.
Hmm. Thinking on it some more, I guess this is not a problem due to the fact
that we never (in any extant version) actually issue any deletions that (in
3.0) would be represented as RTs spanning static rows, so the problematic cases
*should* all be converted to collection tombstones only. I will add some
comments to LegacyLayout elaborating the inconsistencies of modern/legacy
static clusterings as part of the patch.
So, I'm now comfortable with fixing either location, I think. Though I need to
code dive a bit more to be absolutely certain.
> Static collection deletions are corrupted in 3.0 -> 2.{1,2} messages
> --------------------------------------------------------------------
>
> Key: CASSANDRA-14568
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14568
> Project: Cassandra
> Issue Type: Bug
> Reporter: Benedict
> Assignee: Benedict
> Priority: Critical
> Fix For: 3.0.17, 3.11.3
>
>
> In 2.1 and 2.2, row and complex deletions were represented as range
> tombstones. LegacyLayout is our compatibility layer, that translates the
> relevant RT patterns in 2.1/2.2 to row/complex deletions in 3.0, and vice
> versa. Unfortunately, it does not handle the special case of static row
> deletions, they are treated as regular row deletions. Since static rows are
> themselves never directly deleted, the only issue is with collection
> deletions.
> Collection deletions in 2.1/2.2 were encoded as a range tombstone, consisting
> of a sequence of the clustering keys’ data for the affected row, followed by
> the bytes representing the name of the collection column. STATIC_CLUSTERING
> contains zero clusterings, so by treating the deletion as for a regular row,
> zero clusterings are written to precede the column name of the erased
> collection, so the column name is written at position zero.
> This can exhibit itself in at least two ways:
> # If the type of your first clustering key is a variable width type, new
> deletes will begin appearing covering the clustering key represented by the
> column name.
> ** If you have multiple clustering keys, you will receive a RT covering all
> those rows with a matching first clustering key.
> ** This RT will be valid as far as the system is concerned, and go
> undetected unless there are outside data quality checks in place.
> # Otherwise, an invalid size of data will be written to the clustering and
> sent over the network to the 2.1 node.
> ** The 2.1/2.2 node will handle this just fine, even though the record is
> junk. Since it is a deletion covering impossible data, there will be no
> user-API visible effect. But if received as a write from a 3.0 node, it will
> dutifully persist the junk record.
> ** The 3.0 node that originally sent this junk, may later coordinate a read
> of the partition, and will notice a digest mismatch, read-repair and
> serialize the junk to disk
> ** The sstable containing this record is now corrupt; the deserialization
> expects fixed-width data, but it encounters too many (or too few) bytes, and
> is now at an incorrect position to read its structural information
> ** (Alternatively when the 2.1 node is upgraded this will occur on eventual
> compaction)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]