[ 
https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105536#comment-17105536
 ] 

Sylvain Lebresne commented on CASSANDRA-15805:
----------------------------------------------

To understand why this happens, let me write down the atoms that the example of 
the description generates on 2.X (using a simplified representation that I hope 
is clear enough):
{noformat}
atom1: RT([A:_, A:X:b:_])@1,     // beginning of all 'A' rows to beginning of 
A:X's b column
atom2: Cell(A:X:)@2,             // row marker for A:X
atom3: Cell(A:X:a=foo)@2,        // value of a in A:X
atom4: RT([A:X:b:_, A:X:b:!])@3, // collection tombstone for b in A:X's
atom5: RT([A:X:b:!, A:!])@1,     // remainder of covering RT, from end of b in 
A:X to end of all 'A' rows
atom6: Cell(A:X:c=bar)@2         // value of c in A:X
{noformat}
Those atoms are deserialized into {{LegacyCell}} and {{LegacyRangeTombstone}} 
on 3.X as:
{noformat}
atom1: RT(Bound(INCL_START_BOUND(A), 
collection=null)-Bound(EXCL_END_BOUND(A:B), collection=null), deletedAt=1, 
localDeletion=1589204864)
atom2: LegacyCell(REGULAR, name=Cellname(clustering=A:X, column=null, 
collElt=null), v=, ts=2, ldt=2147483647, ttl=0)
atom3: LegacyCell(REGULAR, name=Cellname(clustering=A:X, column=a, 
collElt=null), v=foo, ts=2, ldt=2147483647, ttl=0)
atom4: RT(Bound(INCL_START_BOUND(A:X), collection=b)-Bound(INCL_END_BOUND(A:X), 
collection=b), deletedAt=3, localDeletion=1589204864)
atom5: RT(Bound(EXCL_START_BOUND(A:X), 
collection=null)-Bound(INCL_END_BOUND(A), collection=null), deletedAt=1, 
localDeletion=1589204864)
atom6: LegacyCell(REGULAR, name=Cellname(clustering=A:X, column=c, 
collElt=null), v=bar, ts=2, ldt=2147483647, ttl=0)
{noformat}

I'll point out that those are a direct translation of the 2.X atoms except for 
{{atom1}} and {{atom5}} that are slightly different:
* instead of {{atom1}} stopping at the beginning of the row {{b}} column, it 
extends to the end of the row.
* and instead of {{atom5}} staring after that {{b}} column, it starts after the 
row. Do note however that the order of atoms is still the one above, so that 
atom is effectively out-of-order.

The reason for those differences is the logic [at the beginning of 
{{LegacyLayout.RangeTombstone}}|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1883],
 whose comment is trying to explain, but is basically due to the legacy layer 
having to map all 2.X RTs into either a 3.X range tombstone (so one over 
multiple rows), a row tombstone or a collection one.

Anyway, as mentioned above, the problem is that {{atom5}} is out of order.  
What currently happens is that when {{atom5}} is encountered by 
{{UnfilteredDeserialized.OldFormatDeserializer}}, it will be passed to the 
{{CellGrouper}} currently grouping the row, and will end up in the 
[{{CellGrouper#addGenericTombstone}} 
method|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1544].
  But, because that atom starts strictly after the row being grouped, the 
method returns {{false}} and the row is generated a first time. Later, we get 
{{atom6}} which restarts the row with the value of column {{c}}, after which it 
is generated a second time.


> Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones 
> interacts with collection tombstones
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15805
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15805
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination, Local/SSTable
>            Reporter: Sylvain Lebresne
>            Priority: Normal
>
> The legacy reading code ({{LegacyLayout}} and 
> {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly 
> the case where a range tombstone covering multiple rows interacts with a 
> collection tombstone.
> A simple example of this problem is if one runs on 2.X:
> {noformat}
> CREATE TABLE t (
>   k int,
>   c1 text,
>   c2 text,
>   a text,
>   b set<text>,
>   c text,
>   PRIMARY KEY((k), c1, c2)
> );
> // Delete all rows where c1 is 'A'
> DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A';
> // Inserts a row covered by that previous range tombstone
> INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, 
> 'bar') USING TIMESTAMP 2;
> // Delete the collection of that previously inserted row
> DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X';
> {noformat}
> If the following is ran on 2.X (with everything either flushed in the same 
> table or compacted together), then this will result in the inserted row being 
> duplicated (one part containing the {{a}} column, the other the {{c}} one).
> I will note that this is _not_ a duplicate of CASSANDRA-15789 and this 
> reproduce even with the fix to {{LegacyLayout}} of this ticket. That said, 
> the additional code added to CASSANDRA-15789 to force merging duplicated rows 
> if they are produced _will_ end up fixing this as a consequence (assuming 
> there is no variation of this problem that leads to other visible issues than 
> duplicated rows). That said, I "think" we'd still rather fix the source of 
> the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to