
I've been experimenting with cassandra for quite a while now.

It's time for me to look at backups but I'm not sure what the best practice
is. I want to be able to recover the data to a point in time before any
user or software errors.

We will have two datacentres with 4 servers and RF=3.

Each datacentre will have at most 1.6 TB(includes replication, LZ4
compression, using test data) of data. That is ten years of data after
which we will start purging. This amounts to about 400MB of data generation
per day.

I've read about users doing snapshots of individual nodes to S3(Netflix)
and I've read  about creating virtual datacentres (
http://www.datastax.com/dev/blog/multi-datacenter-replication) where each
virtual datacentre contains a backup node.

There are advantages and disadvantages in both approaches. What are people
doing in their production systems?


Jabbar Azam

Reply via email to