Hello, I've been experimenting with cassandra for quite a while now.
It's time for me to look at backups but I'm not sure what the best practice is. I want to be able to recover the data to a point in time before any user or software errors. We will have two datacentres with 4 servers and RF=3. Each datacentre will have at most 1.6 TB(includes replication, LZ4 compression, using test data) of data. That is ten years of data after which we will start purging. This amounts to about 400MB of data generation per day. I've read about users doing snapshots of individual nodes to S3(Netflix) and I've read about creating virtual datacentres ( http://www.datastax.com/dev/blog/multi-datacenter-replication) where each virtual datacentre contains a backup node. There are advantages and disadvantages in both approaches. What are people doing in their production systems? -- Thanks Jabbar Azam