Re: Cassandra Snapshots giving me corrupted SSTables in the logs

Jonathan Haddad Fri, 28 Mar 2014 13:29:29 -0700

I will +1 the recommendation on using tablesnap over EBS.  S3 is at least
predictable.


Additionally, from a practical standpoint, you may want to back up your
sstables somewhere.  If you use S3, it's easy to pull just the new tables
out via aws-cli tools (s3 sync), to your remote, non-aws server, and not
incur the overhead of routinely backing up the entire dataset.  For a non
trivial database, this matters quite a bit.


On Fri, Mar 28, 2014 at 1:21 PM, Laing, Michael
<michael.la...@nytimes.com>wrote:

> As I tried to say, EBS snapshots require much care or you get corruption
> such as you have encountered.
>
> Does Cassandra quiesce the file system after a snapshot using fsfreeze or
> xfs_freeze? Somehow I doubt it...
>
>
> On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad <j...@jonhaddad.com>wrote:
>
>> I have a nagging memory of reading about issues with virtualization and
>> not actually having durable versions of your data even after an fsync
>> (within the VM).  Googling around lead me to this post:
>> http://petercai.com/virtualization-is-bad-for-database-integrity/
>>
>> It's possible you're hitting this issue, with with the virtualization
>> layer, or with EBS itself.  Just a shot in the dark though, other people
>> would likely know much more than I.
>>
>>
>>
>> On Fri, Mar 28, 2014 at 12:50 PM, Russ Lavoie <ussray...@yahoo.com>wrote:
>>
>>> Robert,
>>>
>>> That is what I thought as well.  But apparently something is happening.
>>>  The only way I can get away with doing this is adding a sleep 60 right
>>> after the nodetool snapshot is executed.  I can reproduce this 100% of the
>>> time by not issuing a sleep after nodetool snapshot.
>>>
>>> This is the error.
>>>
>>> ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290 CassandraDaemon.java
>>> (line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
>>> org.apache.cassandra.io.sstable.CorruptSSTableException:
>>> java.io.EOFException
>>> at
>>> org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:108)
>>> at
>>> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
>>>  at
>>> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
>>> at
>>> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407)
>>>  at
>>> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198)
>>> at
>>> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
>>> at
>>> org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262)
>>> at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>  at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:744)
>>> Caused by: java.io.EOFException
>>> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
>>> at java.io.DataInputStream.readUTF(DataInputStream.java:589)
>>> at java.io.DataInputStream.readUTF(DataInputStream.java:564)
>>> at
>>> org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:83)
>>>  ... 11 more
>>>
>>>
>>>   On Friday, March 28, 2014 2:38 PM, Robert Coli <rc...@eventbrite.com>
>>> wrote:
>>>  On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie <ussray...@yahoo.com>wrote:
>>>
>>> Thank you for your quick response.
>>>
>>> Is there a way to tell when a snapshot is completely done?
>>>
>>>
>>> IIRC, the JMX call blocks until the snapshot completes. It should be
>>> done when nodetool returns.
>>>
>>> =Rob
>>>
>>>
>>>
>>
>>
>> --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> skype: rustyrazorblade
>>
>
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

Reply via email to