[jira] [Commented] (CASSANDRA-11750) Offline scrub should not abort when it hits corruption

Yuki Morishita (JIRA) Wed, 18 May 2016 13:29:30 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-11750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289741#comment-15289741
 ]


Yuki Morishita commented on CASSANDRA-11750:
--------------------------------------------

So, to be clear, the issue happens when one of {{system}} tables is corrupted.
In the description above, OP tried to scrub {{system.compactions_in_progress}} 
table, but the actual exception happened during loading schema (this opens all 
system tables) not during scrubbing SSTables.

If {{system}} tables are fine, then scrubbing continues to work in 2.1/2.2.
In 3.0 and above, schema moved to its own keyspace, so in those version if 
schema SSTables are ok then you can scrub system keyspace.

Probably backporting CASSANDRA-11578 to 2.1 and 2.2 (and even 3.0) should do 
the job.


> Offline scrub should not abort when it hits corruption
> ------------------------------------------------------
>
>                 Key: CASSANDRA-11750
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11750
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Adam Hattrell
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Tools
>
> Hit a failure on startup due to corruption of some sstables in system 
> keyspace.  Deleted the listed file and restarted - came down again with 
> another file.
> Figured that I may as well run scrub to clean up all the files.  Got 
> following error:
> {noformat}
> sstablescrub system compaction_history 
> ERROR 17:21:34 Exiting forcefully due to file system exception on startup, 
> disk failure policy "stop" 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-1936-CompressionInfo.db
>  
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:131)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at 
> org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:169)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:741) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:692) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:480) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:376) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:523) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> [na:1.7.0_79] 
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_79] 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_79] 
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_79] 
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79] 
> Caused by: java.io.EOFException: null 
> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) 
> ~[na:1.7.0_79] 
> at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.7.0_79] 
> at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.7.0_79] 
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:106)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> ... 14 common frames omitted 
> {noformat}
> I guess it might be by design - but I'd argue that I should at least have the 
> option to continue and let it do it's thing.  I'd prefer that sstablescrub 
> ignored the disk failure policy.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11750) Offline scrub should not abort when it hits corruption

Reply via email to