[
https://issues.apache.org/jira/browse/CASSANDRA-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559673#comment-13559673
]
Jonathan Ellis commented on CASSANDRA-5158:
-------------------------------------------
We are not able to reproduce this. We'd need a test case to do more.
> Compaction fails after hours of frequent updates
> ------------------------------------------------
>
> Key: CASSANDRA-5158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5158
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 1.1.8
> Environment: Virtualized Ubuntu 12.04 (linux 2.6.18-274.3.1.el5),
> SAN disk,
> tested with Sun JRE 1.6.0_24 and 1.6.0_38-b05,
> 8 Gb of RAM
> Reporter: Tommi Koivula
>
> A data corruption occurs in one of our customer's environment after ~10 hours
> of running with constant update-load (1-2 writes/sec). Compaction of one of
> the column families start failing after about 10 hours and scrub is not able
> to fix it either. Writes are not producing any errors and compaction of other
> column families works fine. We are not doing any reads at this stage and
> always started from a fresh cassandra instance (no data) during startup.
> After failure we have tried to restart Cassandra, which succeeds, and to
> retrieve all keys, which fails with similar error.
> Cassandra configuration is the default except for directory locations. The
> number of rows in the table is always less than 100k and doesn't increase as
> our application is purging old data in hourly basis. The problematic table is
> very simple with only one column:
> CREATE TABLE xxx ( KEY uuid PRIMARY KEY, value text) WITH
> gc_grace_seconds=0;
> We have tried enabling and disabling JNA without affect on this. JRE was also
> upgraded to 1.6.0_38-b05 with no change.
> The problem reproduces every time at some point but we haven't being able to
> reproduce it faster so it seems to require running several hours of updates.
> We had this problem with Cassandra 1.1.0 and upgraded to 1.1.8 but the
> upgrade didn't fix it. The cluster has only one node so the application is
> suffering a total data loss.
> Error message varies a bit in different runs, here is two produced by two
> different Cassandra 1.1.8 runs:
> [CompactionExecutor:15]|||Exception in thread
> Thread[CompactionExecutor:15,1,main] java.lang.NegativeArraySizeException:
> null
> at
> org.apache.cassandra.io.sstable.IndexHelper.skipBloomFilter(IndexHelper.java:57)
> at
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:144)
> at
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:86)
> at
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:70)
> at
> org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:189)
> at
> org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:153)
> at
> org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:145)
> at
> org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:40)
> at
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:149)
> at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:126)
> at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:100)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> at
> com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> at
> org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173)
> at
> org.apache.cassandra.db.compaction.CompactionManager$2.runMayThrow(CompactionManager.java:164)
> at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662) [na:1.6.0_38]
> [CompactionExecutor:127]|org.apache.cassandra.utils.OutputHandler$LogOutput||Non-fatal
> error reading row (stacktrace follows)
> java.io.IOError: java.io.EOFException: bloom filter claims to be -793390915
> bytes, longer than entire row size -3407588055136546636
> at
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:156)
> at
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:86)
> at
> org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:170)
> at
> org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:496)
> at
> org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:485)
> at
> org.apache.cassandra.db.compaction.CompactionManager.access$300(CompactionManager.java:69)
> at
> org.apache.cassandra.db.compaction.CompactionManager$4.perform(CompactionManager.java:235)
> at
> org.apache.cassandra.db.compaction.CompactionManager$3.call(CompactionManager.java:205)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662) [na:1.6.0_38]
> Caused by: java.io.EOFException: bloom filter claims to be -793390915 bytes,
> longer than entire row size -3407588055136546636
> at
> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:129)
> at
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:121)
> ... 12 common frames omitted
> Data directory for the table looks like this:
> $ ls -l data/p/m
> total 100644
> -rw-r--r-- 1 user group 3342 Jan 14 14:08
> p-m_nodes-hf-1-CompressionInfo.db
> -rw-r--r-- 1 user group 8809019 Jan 14 14:08 p-m_nodes-hf-1-Data.db
> -rw-r--r-- 1 user group 155632 Jan 14 14:08 p-m_nodes-hf-1-Filter.db
> -rw-r--r-- 1 user group 11441829 Jan 14 14:08 p-m_nodes-hf-1-Index.db
> -rw-r--r-- 1 user group 4340 Jan 14 14:08 p-m_nodes-hf-1-Statistics.db
> -rw-r--r-- 1 user group 3310 Jan 14 17:06
> p-m_nodes-hf-2-CompressionInfo.db
> -rw-r--r-- 1 user group 8948738 Jan 14 17:06 p-m_nodes-hf-2-Data.db
> -rw-r--r-- 1 user group 138992 Jan 14 17:06 p-m_nodes-hf-2-Filter.db
> -rw-r--r-- 1 user group 11531149 Jan 14 17:06 p-m_nodes-hf-2-Index.db
> -rw-r--r-- 1 user group 4340 Jan 14 17:06 p-m_nodes-hf-2-Statistics.db
> -rw-r--r-- 1 user group 3270 Jan 14 21:19
> p-m_nodes-hf-3-CompressionInfo.db
> -rw-r--r-- 1 user group 9246229 Jan 14 21:19 p-m_nodes-hf-3-Data.db
> -rw-r--r-- 1 user group 119776 Jan 14 21:19 p-m_nodes-hf-3-Filter.db
> -rw-r--r-- 1 user group 11633474 Jan 14 21:19 p-m_nodes-hf-3-Index.db
> -rw-r--r-- 1 user group 4340 Jan 14 21:19 p-m_nodes-hf-3-Statistics.db
> -rw-r--r-- 1 user group 3350 Jan 15 06:35
> p-m_nodes-hf-4-CompressionInfo.db
> -rw-r--r-- 1 user group 8766723 Jan 15 06:35 p-m_nodes-hf-4-Data.db
> -rw-r--r-- 1 user group 158792 Jan 15 06:35 p-m_nodes-hf-4-Filter.db
> -rw-r--r-- 1 user group 11425708 Jan 15 06:35 p-m_nodes-hf-4-Index.db
> -rw-r--r-- 1 user group 4340 Jan 15 06:35 p-m_nodes-hf-4-Statistics.db
> -rw-r--r-- 1 user group 3318 Jan 15 10:27
> p-m_nodes-hf-6-CompressionInfo.db
> -rw-r--r-- 1 user group 8748453 Jan 15 10:27 p-m_nodes-hf-6-Data.db
> -rw-r--r-- 1 user group 141904 Jan 15 10:27 p-m_nodes-hf-6-Filter.db
> -rw-r--r-- 1 user group 11518264 Jan 15 10:27 p-m_nodes-hf-6-Index.db
> -rw-r--r-- 1 user group 4340 Jan 15 10:27 p-m_nodes-hf-6-Statistics.db
> Commit log directory:
> $ ls -l commitlog/
> total 164012
> -rw-r--r-- 1 user group 33554432 Jan 14 12:44 CommitLog-1358164638228.log
> -rw-r--r-- 1 user group 33554432 Jan 15 10:50 CommitLog-1358164638237.log
> -rw-r--r-- 1 user group 33554432 Jan 15 11:59 CommitLog-1358164638238.log
> -rw-r--r-- 1 user group 33554432 Jan 15 10:27 CommitLog-1358164638239.log
> -rw-r--r-- 1 user group 33554432 Jan 15 11:48 CommitLog-1358164638240.log
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira