[jira] [Created] (CASSANDRA-5158) Compaction fails after hours of frequent updates

Tommi Koivula (JIRA) Tue, 15 Jan 2013 04:54:25 -0800

Tommi Koivula created CASSANDRA-5158:
----------------------------------------


             Summary: Compaction fails after hours of frequent updates
                 Key: CASSANDRA-5158
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5158
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 1.1.8
         Environment: Virtualized Ubuntu 12.04 (linux 2.6.18-274.3.1.el5), 
SAN disk, 
tested with Sun JRE 1.6.0_24 and 1.6.0_38-b05, 
8 Gb of RAM

            Reporter: Tommi Koivula



A data corruption occurs in one of our customer's environment after ~10 hours 
of running with constant update-load (1-2 writes/sec). Compaction of one of the 
column families start failing after about 10 hours and scrub is not able to fix 
it either. Writes are not producing any errors and compaction of other column 
families works fine. We are not doing any reads at this stage and always 
started from a fresh cassandra instance (no data) during startup. After failure 
we have tried to restart Cassandra, which succeeds, and to retrieve all keys, 
which fails with similar error.

Cassandra configuration is the default except for directory locations. The 
number of rows in the table is always less than 100k and doesn't increase as 
our application is purging old data in hourly basis. The problematic table is 
very simple with only one column:

   CREATE TABLE xxx ( KEY uuid PRIMARY KEY, value text) WITH gc_grace_seconds=0;

We have tried enabling and disabling JNA without affect on this. JRE was also 
upgraded to 1.6.0_38-b05 with no change.

The problem reproduces every time at some point but we haven't being able to 
reproduce it faster so it seems to require running several hours of updates. We 
had this problem with Cassandra 1.1.0 and upgraded to 1.1.8 but the upgrade 
didn't fix it. The cluster has only one node so the application is suffering a 
total data loss.

Error message varies a bit in different runs, here is two produced by two 
different Cassandra 1.1.8 runs:

[CompactionExecutor:15]|||Exception in thread 
Thread[CompactionExecutor:15,1,main] java.lang.NegativeArraySizeException: null
        at 
org.apache.cassandra.io.sstable.IndexHelper.skipBloomFilter(IndexHelper.java:57)
        at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:144)
        at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:86)
        at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:70)
        at 
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:189)
        at 
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:153)
        at 
org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:145)
        at 
org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:40)
        at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:149)
        at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:126)
        at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:100)
        at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
        at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
        at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
        at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
        at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
        at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173)
        at 
org.apache.cassandra.db.compaction.CompactionManager$2.runMayThrow(CompactionManager.java:164)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662) [na:1.6.0_38]

[CompactionExecutor:127]|org.apache.cassandra.utils.OutputHandler$LogOutput||Non-fatal
 error reading row (stacktrace follows)
java.io.IOError: java.io.EOFException: bloom filter claims to be -793390915
bytes, longer than entire row size -3407588055136546636
        at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:156)
        at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:86)
        at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:170)
        at 
org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:496)
        at 
org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:485)
        at 
org.apache.cassandra.db.compaction.CompactionManager.access$300(CompactionManager.java:69)
        at 
org.apache.cassandra.db.compaction.CompactionManager$4.perform(CompactionManager.java:235)
        at 
org.apache.cassandra.db.compaction.CompactionManager$3.call(CompactionManager.java:205)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662) [na:1.6.0_38]
Caused by: java.io.EOFException: bloom filter claims to be -793390915 bytes,
longer than entire row size -3407588055136546636
        at 
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:129)
        at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:121)
        ... 12 common frames omitted

Data directory for the table looks like this:

 $ ls -l data/p/m
total 100644
-rw-r--r-- 1 user group     3342 Jan 14 14:08 p-m_nodes-hf-1-CompressionInfo.db
-rw-r--r-- 1 user group  8809019 Jan 14 14:08 p-m_nodes-hf-1-Data.db
-rw-r--r-- 1 user group   155632 Jan 14 14:08 p-m_nodes-hf-1-Filter.db
-rw-r--r-- 1 user group 11441829 Jan 14 14:08 p-m_nodes-hf-1-Index.db
-rw-r--r-- 1 user group     4340 Jan 14 14:08 p-m_nodes-hf-1-Statistics.db
-rw-r--r-- 1 user group     3310 Jan 14 17:06 p-m_nodes-hf-2-CompressionInfo.db
-rw-r--r-- 1 user group  8948738 Jan 14 17:06 p-m_nodes-hf-2-Data.db
-rw-r--r-- 1 user group   138992 Jan 14 17:06 p-m_nodes-hf-2-Filter.db
-rw-r--r-- 1 user group 11531149 Jan 14 17:06 p-m_nodes-hf-2-Index.db
-rw-r--r-- 1 user group     4340 Jan 14 17:06 p-m_nodes-hf-2-Statistics.db
-rw-r--r-- 1 user group     3270 Jan 14 21:19 p-m_nodes-hf-3-CompressionInfo.db
-rw-r--r-- 1 user group  9246229 Jan 14 21:19 p-m_nodes-hf-3-Data.db
-rw-r--r-- 1 user group   119776 Jan 14 21:19 p-m_nodes-hf-3-Filter.db
-rw-r--r-- 1 user group 11633474 Jan 14 21:19 p-m_nodes-hf-3-Index.db
-rw-r--r-- 1 user group     4340 Jan 14 21:19 p-m_nodes-hf-3-Statistics.db
-rw-r--r-- 1 user group     3350 Jan 15 06:35 p-m_nodes-hf-4-CompressionInfo.db
-rw-r--r-- 1 user group  8766723 Jan 15 06:35 p-m_nodes-hf-4-Data.db
-rw-r--r-- 1 user group   158792 Jan 15 06:35 p-m_nodes-hf-4-Filter.db
-rw-r--r-- 1 user group 11425708 Jan 15 06:35 p-m_nodes-hf-4-Index.db
-rw-r--r-- 1 user group     4340 Jan 15 06:35 p-m_nodes-hf-4-Statistics.db
-rw-r--r-- 1 user group     3318 Jan 15 10:27 p-m_nodes-hf-6-CompressionInfo.db
-rw-r--r-- 1 user group  8748453 Jan 15 10:27 p-m_nodes-hf-6-Data.db
-rw-r--r-- 1 user group   141904 Jan 15 10:27 p-m_nodes-hf-6-Filter.db
-rw-r--r-- 1 user group 11518264 Jan 15 10:27 p-m_nodes-hf-6-Index.db
-rw-r--r-- 1 user group     4340 Jan 15 10:27 p-m_nodes-hf-6-Statistics.db

Commit log directory:

$ ls -l commitlog/
total 164012
-rw-r--r-- 1 user group 33554432 Jan 14 12:44 CommitLog-1358164638228.log
-rw-r--r-- 1 user group 33554432 Jan 15 10:50 CommitLog-1358164638237.log
-rw-r--r-- 1 user group 33554432 Jan 15 11:59 CommitLog-1358164638238.log
-rw-r--r-- 1 user group 33554432 Jan 15 10:27 CommitLog-1358164638239.log
-rw-r--r-- 1 user group 33554432 Jan 15 11:48 CommitLog-1358164638240.log





--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-5158) Compaction fails after hours of frequent updates

Reply via email to