Isn't the question about how you back up a cassandra cluster, not a
single node? Can you snapshot the various nodes at different times or do
they need to be synchronized? Is there a minimal set of nodes that are
sufficient to back up?
   Thorsten

Jonathan Ellis wrote:
bin/nodeprobe snapshot

to restore, move the snapshot sstables from the snapshot location to
the live data location (e.g. with dsh).

note that the 0.4 branch, which will become 0.4.1, automatically
flushes each columnfamily when you ask for a snapshot of the table, so
you don't have to do that manually anymore.

On Mon, Oct 5, 2009 at 8:05 AM, Joe Van Dyk <[email protected]> wrote:
How do you take the snapshot?  What's the restore process?

On Mon, Oct 5, 2009 at 5:22 AM, Jonathan Ellis <[email protected]> wrote:
You can take a snapshot and either leave it in place indefinitely or
throw it into your existing backup ecosystem.  That's your best option
for backup no matter which kind of partitioner you're using.

-Jonathan

On Mon, Oct 5, 2009 at 12:52 AM, Edmond Lau <[email protected]> wrote:
For folks who are using or considering using cassandra in their
production systems, what do you use for backups?

With HBase, one could potentially write a mapreduce to perform a row
scan of the entire table (restricted to some historical timestamp to
get a consistent view) and export the data to hdfs.  With Cassandra,
if you're using an ordered partitioner, a similar mechanism could be
built over a key range scan.

With a random partitioner, though, there's no api to iterate through
all existing keys.  Why not?

Edmond

--
Joe Van Dyk
http://fixieconsulting.com



Reply via email to