Rick Branson created CASSANDRA-7764:
---------------------------------------

             Summary: RFC: Range movements will "wake up" previously invisible 
data
                 Key: CASSANDRA-7764
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7764
             Project: Cassandra
          Issue Type: Bug
            Reporter: Rick Branson


Presumably this has been going on as long as Cassandra has existed, but wanted 
to capture it here since it came up in an IRC discussion. This issue will 
probably show up on any cluster eventually.

Scenario:

1) Start with a 3-node cluster, RF=1
2) A 4th node is added to the cluster
3) Data is deleted on ranges belonging to 4th node
4) Wait for GC to clean up some tombstones on 4th node
4) 4th node removed from cluster
5) Deleted data will reappear since it was dormant on the original 3 nodes

This could definitely happen in many other situations where dormant data could 
exist such as inconsistencies that aren't resolved before range movement, but 
the case above seemed the most reasonable to propose as a real-world problem.

The cleanup operation can be used to get rid of the dormant data, but from my 
experience people don't run cleanup unless they're low on disk. It's definitely 
not a best practice for data integrity.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to