[
https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105536#comment-17105536
]
Sylvain Lebresne commented on CASSANDRA-15805:
----------------------------------------------
To understand why this happens, let me write down the atoms that the example of
the description generates on 2.X (using a simplified representation that I hope
is clear enough):
{noformat}
atom1: RT([A:_, A:X:b:_])@1, // beginning of all 'A' rows to beginning of
A:X's b column
atom2: Cell(A:X:)@2, // row marker for A:X
atom3: Cell(A:X:a=foo)@2, // value of a in A:X
atom4: RT([A:X:b:_, A:X:b:!])@3, // collection tombstone for b in A:X's
atom5: RT([A:X:b:!, A:!])@1, // remainder of covering RT, from end of b in
A:X to end of all 'A' rows
atom6: Cell(A:X:c=bar)@2 // value of c in A:X
{noformat}
Those atoms are deserialized into {{LegacyCell}} and {{LegacyRangeTombstone}}
on 3.X as:
{noformat}
atom1: RT(Bound(INCL_START_BOUND(A),
collection=null)-Bound(EXCL_END_BOUND(A:B), collection=null), deletedAt=1,
localDeletion=1589204864)
atom2: LegacyCell(REGULAR, name=Cellname(clustering=A:X, column=null,
collElt=null), v=, ts=2, ldt=2147483647, ttl=0)
atom3: LegacyCell(REGULAR, name=Cellname(clustering=A:X, column=a,
collElt=null), v=foo, ts=2, ldt=2147483647, ttl=0)
atom4: RT(Bound(INCL_START_BOUND(A:X), collection=b)-Bound(INCL_END_BOUND(A:X),
collection=b), deletedAt=3, localDeletion=1589204864)
atom5: RT(Bound(EXCL_START_BOUND(A:X),
collection=null)-Bound(INCL_END_BOUND(A), collection=null), deletedAt=1,
localDeletion=1589204864)
atom6: LegacyCell(REGULAR, name=Cellname(clustering=A:X, column=c,
collElt=null), v=bar, ts=2, ldt=2147483647, ttl=0)
{noformat}
I'll point out that those are a direct translation of the 2.X atoms except for
{{atom1}} and {{atom5}} that are slightly different:
* instead of {{atom1}} stopping at the beginning of the row {{b}} column, it
extends to the end of the row.
* and instead of {{atom5}} staring after that {{b}} column, it starts after the
row. Do note however that the order of atoms is still the one above, so that
atom is effectively out-of-order.
The reason for those differences is the logic [at the beginning of
{{LegacyLayout.RangeTombstone}}|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1883],
whose comment is trying to explain, but is basically due to the legacy layer
having to map all 2.X RTs into either a 3.X range tombstone (so one over
multiple rows), a row tombstone or a collection one.
Anyway, as mentioned above, the problem is that {{atom5}} is out of order.
What currently happens is that when {{atom5}} is encountered by
{{UnfilteredDeserialized.OldFormatDeserializer}}, it will be passed to the
{{CellGrouper}} currently grouping the row, and will end up in the
[{{CellGrouper#addGenericTombstone}}
method|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1544].
But, because that atom starts strictly after the row being grouped, the
method returns {{false}} and the row is generated a first time. Later, we get
{{atom6}} which restarts the row with the value of column {{c}}, after which it
is generated a second time.
> Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones
> interacts with collection tombstones
> ------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-15805
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15805
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Coordination, Local/SSTable
> Reporter: Sylvain Lebresne
> Priority: Normal
>
> The legacy reading code ({{LegacyLayout}} and
> {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly
> the case where a range tombstone covering multiple rows interacts with a
> collection tombstone.
> A simple example of this problem is if one runs on 2.X:
> {noformat}
> CREATE TABLE t (
> k int,
> c1 text,
> c2 text,
> a text,
> b set<text>,
> c text,
> PRIMARY KEY((k), c1, c2)
> );
> // Delete all rows where c1 is 'A'
> DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A';
> // Inserts a row covered by that previous range tombstone
> INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'},
> 'bar') USING TIMESTAMP 2;
> // Delete the collection of that previously inserted row
> DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X';
> {noformat}
> If the following is ran on 2.X (with everything either flushed in the same
> table or compacted together), then this will result in the inserted row being
> duplicated (one part containing the {{a}} column, the other the {{c}} one).
> I will note that this is _not_ a duplicate of CASSANDRA-15789 and this
> reproduce even with the fix to {{LegacyLayout}} of this ticket. That said,
> the additional code added to CASSANDRA-15789 to force merging duplicated rows
> if they are produced _will_ end up fixing this as a consequence (assuming
> there is no variation of this problem that leads to other visible issues than
> duplicated rows). That said, I "think" we'd still rather fix the source of
> the issue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]