[
https://issues.apache.org/jira/browse/CASSANDRA-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420746#comment-13420746
]
Sylvain Lebresne commented on CASSANDRA-4456:
---------------------------------------------
Actually I think this can happen even when snapshots are not used since a
sstable can finish to be compacted just between when we chose sstable for
repair and when we create the CompactionController for the validation
compaction. In particular, I wonder if Michael and Mike have used -snapshot for
their compaction. Though it's true that repair on snapshot will make that way
more likely to happen.
But actually I don't think we need to call getOverlappingSStables at all in the
first place for repair, since this is used only to decide if we can purge but
repair does not do purging. Attaching a simple patch to skip the call entirely.
> AssertionError in ColumnFamilyStore.getOverlappingSSTables() during repair
> --------------------------------------------------------------------------
>
> Key: CASSANDRA-4456
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4456
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.1.2
> Environment: Ubuntu 11.04 64-bit
> Reporter: Mike Heffner
> Assignee: Sylvain Lebresne
> Fix For: 1.1.3
>
> Attachments: 4456.txt
>
>
> We have hit the following exception on several nodes while running repairs
> across our 1.1.2 ring. We've observed it happen on either the node executing
> the repair or a participating replica in the repair operation. The result in
> either case is that the repair hangs.
> ERROR [ValidationExecutor:9] 2012-07-21 01:54:03,019
> AbstractCassandraDaemon.java (line 134) Exception in thread
> Thread[ValidationExecutor:9,1,main]
> java.lang.AssertionError
> at
> org.apache.cassandra.db.ColumnFamilyStore.getOverlappingSSTables(ColumnFamilyStore.java:874)
> at
> org.apache.cassandra.db.compaction.CompactionController.<init>(CompactionController.java:69)
> at
> org.apache.cassandra.db.compaction.CompactionManager$ValidationCompactionIterable.<init>(CompactionManager.java:834)
> at
> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:698)
> at
> org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:68)
> at
> org.apache.cassandra.db.compaction.CompactionManager$8.call(CompactionManager.java:438)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> In building this ring we migrated sstables from an identical 0.8.8 ring by:
> 1. Creating the schema on our new 1.1.2 ring.
> 2. Rsyncing over sstables from 0.8.8 ring.
> 3. Renaming the sstables to match the directory and file naming structure of
> 1.1.x.
> 4. Ran nodetool refresh <keyspace> <cf> for each CF across each node.
> 5. Ran nodetool upgradesstables for each CF across each node.
> When those steps had completed, we began rolling repairs. Not all of the
> repair operations have hit the exception -- some of the repairs have
> completed successfully.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira