Tommi Koivula created CASSANDRA-5158:
----------------------------------------
Summary: Compaction fails after hours of frequent updates
Key: CASSANDRA-5158
URL: https://issues.apache.org/jira/browse/CASSANDRA-5158
Project: Cassandra
Issue Type: Bug
Affects Versions: 1.1.8
Environment: Virtualized Ubuntu 12.04 (linux 2.6.18-274.3.1.el5),
SAN disk,
tested with Sun JRE 1.6.0_24 and 1.6.0_38-b05,
8 Gb of RAM
Reporter: Tommi Koivula
A data corruption occurs in one of our customer's environment after ~10 hours
of running with constant update-load (1-2 writes/sec). Compaction of one of the
column families start failing after about 10 hours and scrub is not able to fix
it either. Writes are not producing any errors and compaction of other column
families works fine. We are not doing any reads at this stage and always
started from a fresh cassandra instance (no data) during startup. After failure
we have tried to restart Cassandra, which succeeds, and to retrieve all keys,
which fails with similar error.
Cassandra configuration is the default except for directory locations. The
number of rows in the table is always less than 100k and doesn't increase as
our application is purging old data in hourly basis. The problematic table is
very simple with only one column:
CREATE TABLE xxx ( KEY uuid PRIMARY KEY, value text) WITH gc_grace_seconds=0;
We have tried enabling and disabling JNA without affect on this. JRE was also
upgraded to 1.6.0_38-b05 with no change.
The problem reproduces every time at some point but we haven't being able to
reproduce it faster so it seems to require running several hours of updates. We
had this problem with Cassandra 1.1.0 and upgraded to 1.1.8 but the upgrade
didn't fix it. The cluster has only one node so the application is suffering a
total data loss.
Error message varies a bit in different runs, here is two produced by two
different Cassandra 1.1.8 runs:
[CompactionExecutor:15]|||Exception in thread
Thread[CompactionExecutor:15,1,main] java.lang.NegativeArraySizeException: null
at
org.apache.cassandra.io.sstable.IndexHelper.skipBloomFilter(IndexHelper.java:57)
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:144)
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:86)
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:70)
at
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:189)
at
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:153)
at
org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:145)
at
org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:40)
at
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:149)
at
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:126)
at
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:100)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173)
at
org.apache.cassandra.db.compaction.CompactionManager$2.runMayThrow(CompactionManager.java:164)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662) [na:1.6.0_38]
[CompactionExecutor:127]|org.apache.cassandra.utils.OutputHandler$LogOutput||Non-fatal
error reading row (stacktrace follows)
java.io.IOError: java.io.EOFException: bloom filter claims to be -793390915
bytes, longer than entire row size -3407588055136546636
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:156)
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:86)
at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:170)
at
org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:496)
at
org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:485)
at
org.apache.cassandra.db.compaction.CompactionManager.access$300(CompactionManager.java:69)
at
org.apache.cassandra.db.compaction.CompactionManager$4.perform(CompactionManager.java:235)
at
org.apache.cassandra.db.compaction.CompactionManager$3.call(CompactionManager.java:205)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662) [na:1.6.0_38]
Caused by: java.io.EOFException: bloom filter claims to be -793390915 bytes,
longer than entire row size -3407588055136546636
at
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:129)
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:121)
... 12 common frames omitted
Data directory for the table looks like this:
$ ls -l data/p/m
total 100644
-rw-r--r-- 1 user group 3342 Jan 14 14:08 p-m_nodes-hf-1-CompressionInfo.db
-rw-r--r-- 1 user group 8809019 Jan 14 14:08 p-m_nodes-hf-1-Data.db
-rw-r--r-- 1 user group 155632 Jan 14 14:08 p-m_nodes-hf-1-Filter.db
-rw-r--r-- 1 user group 11441829 Jan 14 14:08 p-m_nodes-hf-1-Index.db
-rw-r--r-- 1 user group 4340 Jan 14 14:08 p-m_nodes-hf-1-Statistics.db
-rw-r--r-- 1 user group 3310 Jan 14 17:06 p-m_nodes-hf-2-CompressionInfo.db
-rw-r--r-- 1 user group 8948738 Jan 14 17:06 p-m_nodes-hf-2-Data.db
-rw-r--r-- 1 user group 138992 Jan 14 17:06 p-m_nodes-hf-2-Filter.db
-rw-r--r-- 1 user group 11531149 Jan 14 17:06 p-m_nodes-hf-2-Index.db
-rw-r--r-- 1 user group 4340 Jan 14 17:06 p-m_nodes-hf-2-Statistics.db
-rw-r--r-- 1 user group 3270 Jan 14 21:19 p-m_nodes-hf-3-CompressionInfo.db
-rw-r--r-- 1 user group 9246229 Jan 14 21:19 p-m_nodes-hf-3-Data.db
-rw-r--r-- 1 user group 119776 Jan 14 21:19 p-m_nodes-hf-3-Filter.db
-rw-r--r-- 1 user group 11633474 Jan 14 21:19 p-m_nodes-hf-3-Index.db
-rw-r--r-- 1 user group 4340 Jan 14 21:19 p-m_nodes-hf-3-Statistics.db
-rw-r--r-- 1 user group 3350 Jan 15 06:35 p-m_nodes-hf-4-CompressionInfo.db
-rw-r--r-- 1 user group 8766723 Jan 15 06:35 p-m_nodes-hf-4-Data.db
-rw-r--r-- 1 user group 158792 Jan 15 06:35 p-m_nodes-hf-4-Filter.db
-rw-r--r-- 1 user group 11425708 Jan 15 06:35 p-m_nodes-hf-4-Index.db
-rw-r--r-- 1 user group 4340 Jan 15 06:35 p-m_nodes-hf-4-Statistics.db
-rw-r--r-- 1 user group 3318 Jan 15 10:27 p-m_nodes-hf-6-CompressionInfo.db
-rw-r--r-- 1 user group 8748453 Jan 15 10:27 p-m_nodes-hf-6-Data.db
-rw-r--r-- 1 user group 141904 Jan 15 10:27 p-m_nodes-hf-6-Filter.db
-rw-r--r-- 1 user group 11518264 Jan 15 10:27 p-m_nodes-hf-6-Index.db
-rw-r--r-- 1 user group 4340 Jan 15 10:27 p-m_nodes-hf-6-Statistics.db
Commit log directory:
$ ls -l commitlog/
total 164012
-rw-r--r-- 1 user group 33554432 Jan 14 12:44 CommitLog-1358164638228.log
-rw-r--r-- 1 user group 33554432 Jan 15 10:50 CommitLog-1358164638237.log
-rw-r--r-- 1 user group 33554432 Jan 15 11:59 CommitLog-1358164638238.log
-rw-r--r-- 1 user group 33554432 Jan 15 10:27 CommitLog-1358164638239.log
-rw-r--r-- 1 user group 33554432 Jan 15 11:48 CommitLog-1358164638240.log
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira