[ https://issues.apache.org/jira/browse/CASSANDRA-12765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15561315#comment-15561315 ]
Cameron Zemek commented on CASSANDRA-12765: ------------------------------------------- Traced the issue to {code:title=CollationController.java|borderStyle=solid} private ColumnFamily collectAllData(boolean copyOnHeap) { // omitted for brevity if (!filter.shouldInclude(sstable)) { nonIntersectingSSTables++; // sstable contains no tombstone if maxLocalDeletionTime == Integer.MAX_VALUE, so we can safely skip those entirely if (sstable.getSSTableMetadata().maxLocalDeletionTime != Integer.MAX_VALUE) { if (skippedSSTables == null) skippedSSTables = new ArrayList<>(); skippedSSTables.add(sstable); } continue; } {code} The sstable is excluded by the filter because: {code:title=SliceQueryFilter.java|borderStyle=solid} public boolean shouldInclude(SSTableReader sstable) { List<ByteBuffer> minColumnNames = sstable.getSSTableMetadata().minColumnNames; List<ByteBuffer> maxColumnNames = sstable.getSSTableMetadata().maxColumnNames; CellNameType comparator = sstable.metadata.comparator; if (minColumnNames.isEmpty() || maxColumnNames.isEmpty()) return true; for (ColumnSlice slice : slices) if (slice.intersects(minColumnNames, maxColumnNames, comparator, reversed)) return true; return false; } {code} The other partition key means minColumnNames and maxColumnNames are not empty, and because the cluster key is different (eg. test2) it also doesn't intersect. So that means if moves inside the if (!filter.shouldInclude(sstable)). The comment about if maxLocalDeletionTime == Integer.MAX_VALUE means the sstable contains no tombstones is wrong. As shown in the steps to reproduce the sstable that contains the row level deletion and another partition the metadata has maxLocalDeletionTime == Integer.MAX_VALUE because of the live cell. {code:title=ColumnFamily.java|borderStyle=solid} public ColumnStats getColumnStats() { // omitted for brevity for (Cell cell : this) { minTimestampTracker.update(cell.timestamp()); maxTimestampTracker.update(cell.timestamp()); maxDeletionTimeTracker.update(cell.getLocalDeletionTime()); {code} With the patch the sstable is added to skippedSSTables and therefore gets included due to tombstones. As far as I can tell this issue dates back to https://issues.apache.org/jira/browse/CASSANDRA-5514 but I haven't attempted to reproduce in any version earlier then 2.1.15 and its been an issue on a cluster managing which started on 2.1.13, so I have currently tagged this bug as since 2.0 beta 1 since that corresponds to #5514 > SSTable ignored incorrectly with row level tombstone > ---------------------------------------------------- > > Key: CASSANDRA-12765 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12765 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Reporter: Cameron Zemek > Attachments: 12765.patch > > > {noformat} > CREATE TABLE test.payload( > bucket_id TEXT, > name TEXT, > data TEXT, > PRIMARY KEY (bucket_id, name) > ); > insert into test.payload (bucket_id, name, data) values > ('8772618c9009cf8f5a5e0c18', 'test', 'hello'); > {noformat} > Flush nodes (nodetool flush) > {noformat} > insert into test.payload (bucket_id, name, data) values > ('8772618c9009cf8f5a5e0c19', 'test2', 'hello'); > delete from test.payload where bucket_id = '8772618c9009cf8f5a5e0c18'; > {noformat} > Flush nodes (nodetool flush) > {noformat} > select * from test.payload where bucket_id = '8772618c9009cf8f5a5e0c18' and > name = 'test'; > {noformat} > Expected 0 rows but get 1 row back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)