[ 
https://issues.apache.org/jira/browse/CASSANDRA-14568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582286#comment-16582286
 ] 

Benedict commented on CASSANDRA-14568:
--------------------------------------

OK. So, [here|https://github.com/belliottsmith/cassandra/commits/14568-2] is 
may second attempt at fixing this.

In the process of adding improved assertion logic, I realised we might have 
another bug around dropped static collection column, that could have resulted 
in decoding a static collection deletion as a whole-static-row deletion (with 
unknown semantics, since I vaguely recall that our correctness in some places 
depends on there being no such deletions).  

In essence: if on looking up the collectionNameBytes, we found no 
collectionName (due, for instance, to it having been dropped), we would be left 
with a only a complete static row bound to construct.

We clearly need to introduce some upgrade dtests to cover these cases as well

> Static collection deletions are corrupted in 3.0 -> 2.{1,2} messages
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-14568
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14568
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Critical
>             Fix For: 3.0.17, 3.11.3
>
>
> In 2.1 and 2.2, row and complex deletions were represented as range 
> tombstones.  LegacyLayout is our compatibility layer, that translates the 
> relevant RT patterns in 2.1/2.2 to row/complex deletions in 3.0, and vice 
> versa.  Unfortunately, it does not handle the special case of static row 
> deletions, they are treated as regular row deletions. Since static rows are 
> themselves never directly deleted, the only issue is with collection 
> deletions.
> Collection deletions in 2.1/2.2 were encoded as a range tombstone, consisting 
> of a sequence of the clustering keys’ data for the affected row, followed by 
> the bytes representing the name of the collection column.  STATIC_CLUSTERING 
> contains zero clusterings, so by treating the deletion as for a regular row, 
> zero clusterings are written to precede the column name of the erased 
> collection, so the column name is written at position zero.
> This can exhibit itself in at least two ways:
>  # If the type of your first clustering key is a variable width type, new 
> deletes will begin appearing covering the clustering key represented by the 
> column name.
>  ** If you have multiple clustering keys, you will receive a RT covering all 
> those rows with a matching first clustering key.
>  ** This RT will be valid as far as the system is concerned, and go 
> undetected unless there are outside data quality checks in place.
>  # Otherwise, an invalid size of data will be written to the clustering and 
> sent over the network to the 2.1 node.
>  ** The 2.1/2.2 node will handle this just fine, even though the record is 
> junk.  Since it is a deletion covering impossible data, there will be no 
> user-API visible effect.  But if received as a write from a 3.0 node, it will 
> dutifully persist the junk record.
>  ** The 3.0 node that originally sent this junk, may later coordinate a read 
> of the partition, and will notice a digest mismatch, read-repair and 
> serialize the junk to disk
>  ** The sstable containing this record is now corrupt; the deserialization 
> expects fixed-width data, but it encounters too many (or too few) bytes, and 
> is now at an incorrect position to read its structural information
>  ** (Alternatively when the 2.1 node is upgraded this will occur on eventual 
> compaction)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to