We are using cassandra 1.2.10 (With JNA installed) on ubuntu 12.04.3 and are 
running our instances in Amazon Web Services.

What I am trying to do.

Our cassandra systems data is on an EBS volume so we can take snapshots of the 
data and create volumes based on those snapshots and restore them where we want 
to.

The snapshot process 

Step 1
Login to  the cassandra node.

Step 2
Run nodetool clearsnapshot

Step 3
Run nodetool snapshot

Step 4
Take EBS snapshot

The above steps are performed only after the previous command returns.

Restore Process

Step 1
Remove data/system, commit_log and the saved_caches data/<keyspace>/* 
(excluding the snapshot directory)

Step 2
Move all snapshot files into their respective KS/CF locations

Step 3
Start Cassandra

Step 4 
Create the schema

Step 5
Look at the log.  This is where I find a corrupted sstable in our keyspace (not 
system).

Trouble shooting

I suspected a race condition so I did the following:

I inserted a sleep for 60 seconds after issuing “nodetool clearsnapshot” 
I inserted a sleep for 60 seconds after issuing “nodetool snapshot”

Took the snapshot
Restored the snapshot as stated above following those same steps.
It worked with no problem at all.

So my assumption is that Cassandra is doing a few more things after the 
“nodetool snapshot” returns.

Now that you know what is going on, I have my question.

How can I tell when a snapshot is fully complete so I do not have corrupted 
SSTables?

I can reproduce this 100% of the time.

Thanks for your help

Reply via email to