We are using cassandra 1.2.10 (With JNA installed) on ubuntu 12.04.3 and are running our instances in Amazon Web Services.
What I am trying to do. Our cassandra systems data is on an EBS volume so we can take snapshots of the data and create volumes based on those snapshots and restore them where we want to. The snapshot process Step 1 Login to the cassandra node. Step 2 Run nodetool clearsnapshot Step 3 Run nodetool snapshot Step 4 Take EBS snapshot The above steps are performed only after the previous command returns. Restore Process Step 1 Remove data/system, commit_log and the saved_caches data/<keyspace>/* (excluding the snapshot directory) Step 2 Move all snapshot files into their respective KS/CF locations Step 3 Start Cassandra Step 4 Create the schema Step 5 Look at the log. This is where I find a corrupted sstable in our keyspace (not system). Trouble shooting I suspected a race condition so I did the following: I inserted a sleep for 60 seconds after issuing “nodetool clearsnapshot” I inserted a sleep for 60 seconds after issuing “nodetool snapshot” Took the snapshot Restored the snapshot as stated above following those same steps. It worked with no problem at all. So my assumption is that Cassandra is doing a few more things after the “nodetool snapshot” returns. Now that you know what is going on, I have my question. How can I tell when a snapshot is fully complete so I do not have corrupted SSTables? I can reproduce this 100% of the time. Thanks for your help