[ https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105536#comment-17105536 ]
Sylvain Lebresne commented on CASSANDRA-15805: ---------------------------------------------- To understand why this happens, let me write down the atoms that the example of the description generates on 2.X (using a simplified representation that I hope is clear enough): {noformat} atom1: RT([A:_, A:X:b:_])@1, // beginning of all 'A' rows to beginning of A:X's b column atom2: Cell(A:X:)@2, // row marker for A:X atom3: Cell(A:X:a=foo)@2, // value of a in A:X atom4: RT([A:X:b:_, A:X:b:!])@3, // collection tombstone for b in A:X's atom5: RT([A:X:b:!, A:!])@1, // remainder of covering RT, from end of b in A:X to end of all 'A' rows atom6: Cell(A:X:c=bar)@2 // value of c in A:X {noformat} Those atoms are deserialized into {{LegacyCell}} and {{LegacyRangeTombstone}} on 3.X as: {noformat} atom1: RT(Bound(INCL_START_BOUND(A), collection=null)-Bound(EXCL_END_BOUND(A:B), collection=null), deletedAt=1, localDeletion=1589204864) atom2: LegacyCell(REGULAR, name=Cellname(clustering=A:X, column=null, collElt=null), v=, ts=2, ldt=2147483647, ttl=0) atom3: LegacyCell(REGULAR, name=Cellname(clustering=A:X, column=a, collElt=null), v=foo, ts=2, ldt=2147483647, ttl=0) atom4: RT(Bound(INCL_START_BOUND(A:X), collection=b)-Bound(INCL_END_BOUND(A:X), collection=b), deletedAt=3, localDeletion=1589204864) atom5: RT(Bound(EXCL_START_BOUND(A:X), collection=null)-Bound(INCL_END_BOUND(A), collection=null), deletedAt=1, localDeletion=1589204864) atom6: LegacyCell(REGULAR, name=Cellname(clustering=A:X, column=c, collElt=null), v=bar, ts=2, ldt=2147483647, ttl=0) {noformat} I'll point out that those are a direct translation of the 2.X atoms except for {{atom1}} and {{atom5}} that are slightly different: * instead of {{atom1}} stopping at the beginning of the row {{b}} column, it extends to the end of the row. * and instead of {{atom5}} staring after that {{b}} column, it starts after the row. Do note however that the order of atoms is still the one above, so that atom is effectively out-of-order. The reason for those differences is the logic [at the beginning of {{LegacyLayout.RangeTombstone}}|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1883], whose comment is trying to explain, but is basically due to the legacy layer having to map all 2.X RTs into either a 3.X range tombstone (so one over multiple rows), a row tombstone or a collection one. Anyway, as mentioned above, the problem is that {{atom5}} is out of order. What currently happens is that when {{atom5}} is encountered by {{UnfilteredDeserialized.OldFormatDeserializer}}, it will be passed to the {{CellGrouper}} currently grouping the row, and will end up in the [{{CellGrouper#addGenericTombstone}} method|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1544]. But, because that atom starts strictly after the row being grouped, the method returns {{false}} and the row is generated a first time. Later, we get {{atom6}} which restarts the row with the value of column {{c}}, after which it is generated a second time. > Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones > interacts with collection tombstones > ------------------------------------------------------------------------------------------------------------------ > > Key: CASSANDRA-15805 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15805 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Local/SSTable > Reporter: Sylvain Lebresne > Priority: Normal > > The legacy reading code ({{LegacyLayout}} and > {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly > the case where a range tombstone covering multiple rows interacts with a > collection tombstone. > A simple example of this problem is if one runs on 2.X: > {noformat} > CREATE TABLE t ( > k int, > c1 text, > c2 text, > a text, > b set<text>, > c text, > PRIMARY KEY((k), c1, c2) > ); > // Delete all rows where c1 is 'A' > DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A'; > // Inserts a row covered by that previous range tombstone > INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, > 'bar') USING TIMESTAMP 2; > // Delete the collection of that previously inserted row > DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X'; > {noformat} > If the following is ran on 2.X (with everything either flushed in the same > table or compacted together), then this will result in the inserted row being > duplicated (one part containing the {{a}} column, the other the {{c}} one). > I will note that this is _not_ a duplicate of CASSANDRA-15789 and this > reproduce even with the fix to {{LegacyLayout}} of this ticket. That said, > the additional code added to CASSANDRA-15789 to force merging duplicated rows > if they are produced _will_ end up fixing this as a consequence (assuming > there is no variation of this problem that leads to other visible issues than > duplicated rows). That said, I "think" we'd still rather fix the source of > the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org