Corrupt sstables cause compaction to fail again, and again and again, ...
-------------------------------------------------------------------------
Key: CASSANDRA-2084
URL: https://issues.apache.org/jira/browse/CASSANDRA-2084
Project: Cassandra
Issue Type: Bug
Components: Core
Affects Versions: 0.7.0
Environment: Ubuntu 10.10
Cassandra 0.7.0
4 Nodes
Reporter: Dan Hendry
I have been having some serious data corruption issues in my cluster. I suspect
some deeper more serious Cassandra bug but I dont know what or where it is and
I have not found a way to reproduce the issues I have been having.
This ticket is for a behaviour I have observed where cassandra starts
compacting a set of sstables, fails, does not clean up the tmp files, then
start compacting the exact same set of sstables again. (See logs below). After
awhile, the node runs out of disk space and crashes. At the very least,
cassandra should clean up temp files after a failed compaction. Better yet, it
should stop trying to compact that file and log what file the error occurred
for. The list of corrupt sstables does not even have to be persistent, just an
in memory list which gets wiped out on a restart.
Here is a sample log, the same 4 sstables are being compacted then failing then
being compacted again.
INFO [CompactionExecutor:1] 2011-01-31 13:08:26,434 CompactionManager.java
(line 272) Compacting
[org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-562-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-692-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-773-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-940-Data.db')]
INFO [HintedHandoff:1] 2011-01-31 13:08:28,878 HintedHandOffManager.java (line
226) Could not complete hinted handoff to /192.168.4.16
INFO [HintedHandoff:1] 2011-01-31 13:08:28,879 ColumnFamilyStore.java (line
648) switching in a fresh Memtable for HintsColumnFamily at
CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1296500864696.log',
position=104140211)
INFO [HintedHandoff:1] 2011-01-31 13:08:28,879 ColumnFamilyStore.java (line
952) Enqueuing flush of Memtable-HintsColumnFamily@1652350488(1155546 bytes,
20839 operations)
INFO [FlushWriter:1] 2011-01-31 13:08:28,879 Memtable.java (line 155) Writing
Memtable-HintsColumnFamily@1652350488(1155546 bytes, 20839 operations)
INFO [FlushWriter:1] 2011-01-31 13:08:29,199 Memtable.java (line 162)
Completed flushing /var/lib/cassandra/data/system/HintsColumnFamily-e-9-Data.db
(1075487 bytes)
INFO [GossipStage:1] 2011-01-31 13:08:45,508 Gossiper.java (line 569)
InetAddress /192.168.4.16 is now UP
INFO [COMMIT-LOG-WRITER] 2011-01-31 13:08:59,736 CommitLogSegment.java (line
50) Creating new commitlog segment
/var/lib/cassandra/commitlog/CommitLog-1296500939735.log
INFO [MutationStage:8] 2011-01-31 13:09:15,868 ColumnFamilyStore.java (line
648) switching in a fresh Memtable for UserSearch at
CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1296500939735.log',
position=56028937)
INFO [MutationStage:8] 2011-01-31 13:09:15,868 ColumnFamilyStore.java (line
952) Enqueuing flush of Memtable-UserSearch@1186863256(174163962 bytes, 2097155
operations)
INFO [FlushWriter:1] 2011-01-31 13:09:15,868 Memtable.java (line 155) Writing
Memtable-UserSearch@1186863256(174163962 bytes, 2097155 operations)
ERROR [CompactionExecutor:1] 2011-01-31 13:09:22,462
AbstractCassandraDaemon.java (line 91) Fatal exception in thread
Thread[CompactionExecutor:1,1,main]
java.io.IOError: java.io.EOFException: attempted to skip 776104308 bytes but
only skipped 8469212
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:78)
at
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:178)
at
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:143)
at
org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:135)
at
org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38)
at
org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:284)
at
org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
at
org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
at
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at
org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
at
org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
at
org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:323)
at
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:122)
at
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:92)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException: attempted to skip 776104308 bytes but only
skipped 8469212
at
org.apache.cassandra.io.sstable.IndexHelper.skipBloomFilter(IndexHelper.java:52)
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:69)
... 20 more
INFO [CompactionExecutor:1] 2011-01-31 13:09:22,463 CompactionManager.java
(line 272) Compacting
[org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-562-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-692-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-773-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-940-Data.db')]
INFO [FlushWriter:1] 2011-01-31 13:09:29,010 Memtable.java (line 162)
Completed flushing /var/lib/cassandra/data/kikmetrics/UserSearch-e-1264-Data.db
(184687455 bytes)
INFO [COMMIT-LOG-WRITER] 2011-01-31 13:09:38,221 CommitLogSegment.java (line
50) Creating new commitlog segment
/var/lib/cassandra/commitlog/CommitLog-1296500978221.log
INFO [COMMIT-LOG-WRITER] 2011-01-31 13:10:15,781 CommitLogSegment.java (line
50) Creating new commitlog segment
/var/lib/cassandra/commitlog/CommitLog-1296501015781.log
ERROR [CompactionExecutor:1] 2011-01-31 13:10:29,139
AbstractCassandraDaemon.java (line 91) Fatal exception in thread
Thread[CompactionExecutor:1,1,main]
java.io.IOError: java.io.EOFException: attempted to skip 776104308 bytes but
only skipped 8469212
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:78)
at
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:178)
at
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:143)
at
org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:135)
at
org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38)
at
org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:284)
at
org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
at
org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
at
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at
org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
at
org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
at
org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:323)
at
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:122)
at
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:92)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException: attempted to skip 776104308 bytes but only
skipped 8469212
at
org.apache.cassandra.io.sstable.IndexHelper.skipBloomFilter(IndexHelper.java:52)
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:69)
... 20 more
INFO [CompactionExecutor:1] 2011-01-31 13:10:29,148 CompactionManager.java
(line 272) Compacting
[org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-562-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-692-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-773-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-940-Data.db')]
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira