[
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stefan Podkowinski updated CASSANDRA-13153:
-------------------------------------------
Resolution: Fixed
Fix Version/s: 4.0
3.11.0
3.0.13
2.2.10
Reproduced In: 2.2.8, 2.2.7 (was: 2.2.7, 2.2.8)
Status: Resolved (was: Ready to Commit)
Merged as 06316df549c0096bd774893a405d1d32512e97bf
> Reappeared Data when Mixing Incremental and Full Repairs
> --------------------------------------------------------
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
> Issue Type: Bug
> Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
> Reporter: Amanda Debrot
> Assignee: Stefan Podkowinski
> Labels: Cassandra
> Fix For: 2.2.10, 3.0.13, 3.11.0, 4.0
>
> Attachments: log-Reappeared-Data.txt,
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and
> SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2
> but it most likely also affects all Cassandra versions after 2.2, if they
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as
> repaired. Then if it is past gc_grace, and the tombstone and data has been
> compacted out on other replicas, the next incremental repair will push the
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
> Node 1 has data and tombstone in separate SSTables.
> Node 2 has data and no tombstone.
> Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged
> to 1 SSTable on Nodes 1 and 3.
> Node 1 had a minor compaction that merged data with tombstone. 1
> SSTable with tombstone.
> Node 2 has data and tombstone in separate SSTables.
> Node 3 had a minor compaction that merged data with tombstone. 1
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr).
> Now there are 2 scenarios where the Data SSTable will get marked as
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired"
> and anticompacted, they have had minor compactions with other SSTables
> containing keys from other ranges. During full repair, if the last node to
> run it doesn't own this particular key in it's partitioner range, the Data
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now
> in the next incremental repair, if the Data SSTable is involved in a minor
> compaction during the repair but the Tombstone SSTable is not, the resulting
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables
> containing keys from other ranges after being marked as "Repaired". The
> Tombstone SSTable was never involved in a minor compaction so therefore all
> keys in that SSTable belong to 1 particular partitioner range. During full
> repair, if the last node to run it doesn't own this particular key in it's
> partitioner range, the Data SSTable will get anticompacted and marked as
> "Unrepaired". The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that
> key, the tombstone will get compacted out.
> Node 1 has nothing.
> Node 2 has data (in unrepaired SSTable) and tombstone (in repaired
> SSTable) in separate SSTables.
> Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable
> to build the merkle tree since the tombstone SSTable is flagged as repaired
> and data SSTable is marked as unrepaired. And the data will get repaired
> against the other two nodes.
> Node 1 has data.
> Node 2 has data and tombstone in separate SSTables.
> Node 3 has data.
> If a read request hits Node 1 and 3, it will return data. If it hits 1 and
> 2, or 2 and 3, however, it would return no data.
> Tested this with single range tokens for simplicity.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)