[
https://issues.apache.org/jira/browse/CASSANDRA-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Garvit Juniwal updated CASSANDRA-12728:
---------------------------------------
Attachment: CASSANDRA-12728.patch
If a hints file gets truncated due to a crash, the last hint might have
been written incompletely. Presence of a corrupted hints file can put
Cassandra in a crash loop trying o read a corrupted hint over and over.
Ignore and discard the last incomplete hint instead of throwing a
FSReadError exception.
> Handling partially written hint files
> -------------------------------------
>
> Key: CASSANDRA-12728
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12728
> Project: Cassandra
> Issue Type: Bug
> Reporter: Sharvanath Pathak
> Assignee: Aleksey Yeschenko
> Attachments: CASSANDRA-12728.patch
>
>
> {noformat}
> ERROR [HintsDispatcher:1] 2016-09-28 17:44:43,397
> HintsDispatchExecutor.java:225 - Failed to dispatch hints file
> d5d7257c-9f81-49b2-8633-6f9bda6e3dea-1474892654160-1.hints: file is corrupted
> ({})
> org.apache.cassandra.io.FSReadError: java.io.EOFException
> at
> org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:282)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:252)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:156)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:137)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:119)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:91)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:259)
> [apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:242)
> [apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:220)
> [apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:199)
> [apache-cassandra-3.0.6.jar:3.0.6]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_77]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [na:1.8.0_77]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_77]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_77]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
> Caused by: java.io.EOFException: null
> at
> org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:68)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:60)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.ChecksummedDataInput.readFully(ChecksummedDataInput.java:126)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:402)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsReader$BuffersIterator.readBuffer(HintsReader.java:310)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNextInternal(HintsReader.java:301)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:278)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> ... 15 common frames omitted
> {noformat}
> We've found out that the hint file was truncated because there was a hard
> reboot around the time of last write to the file. I think we basically need
> to handle partially written hint files. Also, the CRC file does not exist in
> this case (probably because it crashed while writing the hints file). May be
> ignoring and cleaning up such partially written hint files can be a way to
> fix this?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)