[
https://issues.apache.org/jira/browse/CASSANDRA-11750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289741#comment-15289741
]
Yuki Morishita commented on CASSANDRA-11750:
--------------------------------------------
So, to be clear, the issue happens when one of {{system}} tables is corrupted.
In the description above, OP tried to scrub {{system.compactions_in_progress}}
table, but the actual exception happened during loading schema (this opens all
system tables) not during scrubbing SSTables.
If {{system}} tables are fine, then scrubbing continues to work in 2.1/2.2.
In 3.0 and above, schema moved to its own keyspace, so in those version if
schema SSTables are ok then you can scrub system keyspace.
Probably backporting CASSANDRA-11578 to 2.1 and 2.2 (and even 3.0) should do
the job.
> Offline scrub should not abort when it hits corruption
> ------------------------------------------------------
>
> Key: CASSANDRA-11750
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11750
> Project: Cassandra
> Issue Type: Bug
> Reporter: Adam Hattrell
> Assignee: Yuki Morishita
> Priority: Minor
> Labels: Tools
>
> Hit a failure on startup due to corruption of some sstables in system
> keyspace. Deleted the listed file and restarted - came down again with
> another file.
> Figured that I may as well run scrub to clean up all the files. Got
> following error:
> {noformat}
> sstablescrub system compaction_history
> ERROR 17:21:34 Exiting forcefully due to file system exception on startup,
> disk failure policy "stop"
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted:
> /cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-1936-CompressionInfo.db
>
> at
> org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:131)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at
> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at
> org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at
> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at
> org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:169)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:741)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:692)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:480)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:376)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at
> org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:523)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> [na:1.7.0_79]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_79]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> [na:1.7.0_79]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> [na:1.7.0_79]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
> Caused by: java.io.EOFException: null
> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
> ~[na:1.7.0_79]
> at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.7.0_79]
> at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.7.0_79]
> at
> org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:106)
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> ... 14 common frames omitted
> {noformat}
> I guess it might be by design - but I'd argue that I should at least have the
> option to continue and let it do it's thing. I'd prefer that sstablescrub
> ignored the disk failure policy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)