Thanks for the replies guys. It sounds like restoration via snapshots + some application-side logic to sanity check/repair any data around the snapshot time is the way to go.
Edmond On Mon, Oct 5, 2009 at 10:15 AM, Jonathan Ellis <[email protected]> wrote: > On Mon, Oct 5, 2009 at 11:23 AM, Thorsten von Eicken <[email protected]> > wrote: >> Isn't the question about how you back up a cassandra cluster, not a >> single node? > > Sure, but the generalization is straightforward. :) > >> Can you snapshot the various nodes at different times or do >> they need to be synchronized? > > The closer the synchronization, the more consistent they will be. > (Since Cassandra is designed around eventual consistency, there's some > flexibility here. Conversely, there's no way to tell the system > "don't accept any more writes until the snapshot is done.") > >> Is there a minimal set of nodes that are >> sufficient to back up? > > Assuming your replication is 100% up to date, backing up every N nodes > where N is the replication factor could be adequate in theory, but I > wouldn't recommend trying to be clever like that, since if you > "restored" from backup like that your system would be in a degraded > state and vulnerable to any of the restored nodes failing. > > -Jonathan >
