Nick Bailey created CASSANDRA-6011:
--------------------------------------

             Summary: Race condition in snapshot repair
                 Key: CASSANDRA-6011
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6011
             Project: Cassandra
          Issue Type: Bug
          Components: Core
            Reporter: Nick Bailey
             Fix For: 1.2.10, 2.0.1


When we do a snapshot/sequential repair, we use the repair session id as the 
snapshot name. Unfortunately in Directories.java when we delete a snapshot, we 
delete it for all column families, even when called on a specific cf store.

So what can happen is this:

Node B finishes validation compaction for CF1 and Notifies Node A
Node B *starts* to delete snapshot for CF1
Node A finishes repair of CF1 and starts repair of CF2
Node B takes snapshot of CF2 and starts validation compaction, but the previous 
validation compaction is still deleting snapshots, so the snapshot it wants to 
run a validation on gets deleted out from under it.

I've only reproduced on 1.2.6, but looking at the code this definitely looks 
like it exists in 1.2 HEAD. Not positive about 2.0.

I think the fix is just to update Directories.java to not delete the snapshot 
from all column families.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to