I will +1 the recommendation on using tablesnap over EBS. S3 is at least predictable.
Additionally, from a practical standpoint, you may want to back up your sstables somewhere. If you use S3, it's easy to pull just the new tables out via aws-cli tools (s3 sync), to your remote, non-aws server, and not incur the overhead of routinely backing up the entire dataset. For a non trivial database, this matters quite a bit. On Fri, Mar 28, 2014 at 1:21 PM, Laing, Michael <michael.la...@nytimes.com>wrote: > As I tried to say, EBS snapshots require much care or you get corruption > such as you have encountered. > > Does Cassandra quiesce the file system after a snapshot using fsfreeze or > xfs_freeze? Somehow I doubt it... > > > On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad <j...@jonhaddad.com>wrote: > >> I have a nagging memory of reading about issues with virtualization and >> not actually having durable versions of your data even after an fsync >> (within the VM). Googling around lead me to this post: >> http://petercai.com/virtualization-is-bad-for-database-integrity/ >> >> It's possible you're hitting this issue, with with the virtualization >> layer, or with EBS itself. Just a shot in the dark though, other people >> would likely know much more than I. >> >> >> >> On Fri, Mar 28, 2014 at 12:50 PM, Russ Lavoie <ussray...@yahoo.com>wrote: >> >>> Robert, >>> >>> That is what I thought as well. But apparently something is happening. >>> The only way I can get away with doing this is adding a sleep 60 right >>> after the nodetool snapshot is executed. I can reproduce this 100% of the >>> time by not issuing a sleep after nodetool snapshot. >>> >>> This is the error. >>> >>> ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290 CassandraDaemon.java >>> (line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main] >>> org.apache.cassandra.io.sstable.CorruptSSTableException: >>> java.io.EOFException >>> at >>> org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:108) >>> at >>> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63) >>> at >>> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42) >>> at >>> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407) >>> at >>> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198) >>> at >>> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157) >>> at >>> org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262) >>> at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:744) >>> Caused by: java.io.EOFException >>> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) >>> at java.io.DataInputStream.readUTF(DataInputStream.java:589) >>> at java.io.DataInputStream.readUTF(DataInputStream.java:564) >>> at >>> org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:83) >>> ... 11 more >>> >>> >>> On Friday, March 28, 2014 2:38 PM, Robert Coli <rc...@eventbrite.com> >>> wrote: >>> On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie <ussray...@yahoo.com>wrote: >>> >>> Thank you for your quick response. >>> >>> Is there a way to tell when a snapshot is completely done? >>> >>> >>> IIRC, the JMX call blocks until the snapshot completes. It should be >>> done when nodetool returns. >>> >>> =Rob >>> >>> >>> >> >> >> -- >> Jon Haddad >> http://www.rustyrazorblade.com >> skype: rustyrazorblade >> > > -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade