[ 
https://issues.apache.org/jira/browse/CASSANDRA-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051915#comment-13051915
 ] 

Sylvain Lebresne edited comment on CASSANDRA-2793 at 6/20/11 10:46 AM:
-----------------------------------------------------------------------

bq. Hi the issue reported was that the sstable corruption is blocking 
compaction with the consequence the bucket of sstables Cassandra wants to 
compact just grows and you get huge cpu load (from repeated attempts at 
compaction and increasing read inefficiency).

This is a dupe of CASSANDRA-2261.

bq. the trace also shows that it has just skipped the corrupted row so in fact 
it hasn't solved the problem at all.

In most cases of corruption, there is not much more we can do than skip the 
row. As the long as the corruption is local and you don't use RF=1, this is 
usually not a big deal (which does not mean corruption is something we should 
be happy with).

bq. The corruption itself is also an issue

Corruption can be of two forms: either we have a bug or the corruption is 
external (bad hard drive for instance). Hard drive corruptions do happen and 
there is not much we can do about it (well, actually we should use checksum to 
at least better dectect them : CASSANDRA-1717). On the front of a bug, since I 
see this happens on a Super column family, it could be due to a race fixed by 
CASSANDRA-2675.



      was (Author: slebresne):
    bq. Hi the issue reported was that the sstable corruption is blocking 
compaction with the consequence the bucket of sstables Cassandra wants to 
compact just grows and you get huge cpu load (from repeated attempts at 
compaction and increasing read inefficiency).

This is a dupe of https://issues.apache.org/jira/browse/CASSANDRA-2261.

bq. the trace also shows that it has just skipped the corrupted row so in fact 
it hasn't solved the problem at all.

In most cases of corruption, there is not much more we can do than skip the 
row. As the long as the corruption is local and you don't use RF=1, this is 
usually not a big deal (which does not mean corruption is something we should 
be happy with).

bq. The corruption itself is also an issue

Corruption can be of two forms: either we have a bug or the corruption is 
external (bad hard drive for instance). Hard drive corruptions do happen and 
there is not much we can do about it (well, actually we should use checksum to 
at least better dectect them : CASSANDRA-1717). On the front of a bug, since I 
see this happens on a Super column family, it could be due to a race fixed by 
CASSANDRA-2675.


  
> SSTable "Corrupt (negative) value length encountered" exception blocks 
> compaction.
> ----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2793
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2793
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>         Environment: Ubuntu
>            Reporter: Dominic Williams
>
> A node was consistently experiencing high CPU load. Examination of the logs 
> showed that compaction of an sstable was failing with an error:
>  INFO [CompactionExecutor:1] 2011-06-17 00:18:51,676 CompactionManager.java 
> (line 395) Compacting 
> [SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6993-Data.db'),SSTableReader(
> path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6994-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6995-Data.db'),SSTableReader(path='/var/opt/cassandra
> /data/FightMyMonster/UserMonsters-f-6996-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6998-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/Use
> rMonsters-f-7000-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7002-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7004-Data.db
> '),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7006-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7008-Data.db'),SSTableReader(path='/
> var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7010-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7012-Data.db'),SSTableReader(path='/var/opt/cassandra/data/F
> ightMyMonster/UserMonsters-f-7014-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7016-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonste
> rs-f-7018-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7020-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7022-Data.db'),SSTa
> bleReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7024-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7026-Data.db'),SSTableReader(path='/var/opt
> /cassandra/data/FightMyMonster/UserMonsters-f-7028-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7030-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyM
> onster/UserMonsters-f-7032-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7034-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-70
> 36-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7038-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7040-Data.db'),SSTableRead
> er(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7042-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7044-Data.db'),SSTableReader(path='/var/opt/cassan
> dra/data/FightMyMonster/UserMonsters-f-7046-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7048-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7050-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7052-Data.db')]
> ERROR [CompactionExecutor:1] 2011-06-17 00:19:21,446 
> AbstractCassandraDaemon.java (line 114) Fatal exception in thread 
> Thread[CompactionExecutor:1,1,main]
> java.io.IOError: java.io.IOException: Corrupt (negative) value length 
> encountered        at 
> org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252)
>         at 
> org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268)
>         at 
> org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227)    
>     at 
> java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
>         at 
> java.util.concurrent.ConcurrentSkipListMap.<init>(ConcurrentSkipListMap.java:1443)
>         at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379)
>         at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362)
>         at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322)
>         at 
> org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129)
>         at 
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:201)
>         at 
> org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:78)
>         at 
> org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:154)
>         at 
> org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:110)
>         at 
> org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:45)
>         at 
> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:74)
>         at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>         at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>         at 
> org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
>         at 
> org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
>         at 
> org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:448)
>         at 
> org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:124)
>         at 
> org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:94)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: java.io.IOException: Corrupt (negative) value length encountered
>         at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:315)
>         at 
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:99)
>         at 
> org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:248)
>         ... 26 more
> Scrub was run on the keyspace (as a last ditch measure) but this did not work:
>  INFO [CompactionExecutor:1] 2011-06-17 00:43:42,023 CompactionManager.java 
> (line 511) Scrubbing 
> SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7494-Data.db')
>  INFO [CompactionExecutor:1] 2011-06-17 00:43:43,317 CompactionManager.java 
> (line 652) Scrub of 
> SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7494-Data.db')
>  complete: 379 row
> s in new sstable and 0 empty (tombstoned) rows dropped
>  INFO [CompactionExecutor:1] 2011-06-17 00:43:43,317 CompactionManager.java 
> (line 511) Scrubbing 
> SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6994-Data.db')
>  WARN [CompactionExecutor:1] 2011-06-17 00:43:44,516 CompactionManager.java 
> (line 606) Non-fatal error reading row (stacktrace follows)
> java.io.IOError: java.io.IOException: Corrupt (negative) value length 
> encountered
>         at 
> org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252)
>         at 
> org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268)
>         at 
> org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227)
>         at 
> java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
>         at 
> java.util.concurrent.ConcurrentSkipListMap.<init>(ConcurrentSkipListMap.java:1443)
>         at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379)
>         at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362)
>         at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322)
>         at 
> org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129)
>         at 
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:201)
>         at 
> org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:78)
>         at 
> org.apache.cassandra.db.CompactionManager.getCompactedRow(CompactionManager.java:783)
>         at 
> org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.java:590)
>         at 
> org.apache.cassandra.db.CompactionManager.access$600(CompactionManager.java:56)
>         at 
> org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:195)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: java.io.IOException: Corrupt (negative) value length encountered
>         at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:315)
>         at 
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:99)
>         at 
> org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:248)
>         ... 19 more
>  WARN [CompactionExecutor:1] 2011-06-17 00:43:44,517 CompactionManager.java 
> (line 640) Row at 9517800 is unreadable; skipping to next
>  INFO [CompactionExecutor:1] 2011-06-17 00:43:45,073 CompactionManager.java 
> (line 652) Scrub of 
> SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6994-Data.db')
>  complete: 1029 rows in new sstable and 0 empty (tombstoned) rows dropped
>  WARN [CompactionExecutor:1] 2011-06-17 00:43:45,073 CompactionManager.java 
> (line 654) Unable to recover 1 rows that were skipped.  You can attempt 
> manual recovery from the pre-scrub snapshot.  You can also run nodetool 
> repair to transfer the data from a healthy replica, if any

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to