[
https://issues.apache.org/jira/browse/CASSANDRA-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709858#comment-15709858
]
Benjamin Roth edited comment on CASSANDRA-12728 at 11/30/16 9:38 PM:
---------------------------------------------------------------------
+1
Let the operator decide if he prefers a crash or inconsistency. When not
crashing it should be logged as error, so you can check error logs and instead
of having to recover from a crash, you could start a repair if desired. The
only recovery action one can take is to repair anyway. The only question is how
to fail and how to get notified.
If the node crashes and the operator recognizes too late, situation may become
even worse when hints expire.
The crash doesn't necessarily happen on startup. It may occur much later if
there are a lot of hints and only the very last file is broken.
was (Author: brstgt):
+1
Let the operator decide if he prefers a crash or inconsistency. When not
crashing it should be logged as error, so you can check error logs and instead
of having to recover from a crash, you could start a repair if desired. The
only recovery action one can take is to repair anyway. The only question is how
to fail and how to get notified.
If the node crashes and the operator recognizes too late, situation may become
even worse when hints expire.
> Handling partially written hint files
> -------------------------------------
>
> Key: CASSANDRA-12728
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12728
> Project: Cassandra
> Issue Type: Bug
> Reporter: Sharvanath Pathak
> Assignee: Aleksey Yeschenko
> Labels: lhf
> Attachments: CASSANDRA-12728.patch
>
>
> {noformat}
> ERROR [HintsDispatcher:1] 2016-09-28 17:44:43,397
> HintsDispatchExecutor.java:225 - Failed to dispatch hints file
> d5d7257c-9f81-49b2-8633-6f9bda6e3dea-1474892654160-1.hints: file is corrupted
> ({})
> org.apache.cassandra.io.FSReadError: java.io.EOFException
> at
> org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:282)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:252)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:156)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:137)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:119)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:91)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:259)
> [apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:242)
> [apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:220)
> [apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:199)
> [apache-cassandra-3.0.6.jar:3.0.6]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_77]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [na:1.8.0_77]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_77]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_77]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
> Caused by: java.io.EOFException: null
> at
> org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:68)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:60)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.ChecksummedDataInput.readFully(ChecksummedDataInput.java:126)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:402)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsReader$BuffersIterator.readBuffer(HintsReader.java:310)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNextInternal(HintsReader.java:301)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:278)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> ... 15 common frames omitted
> {noformat}
> We've found out that the hint file was truncated because there was a hard
> reboot around the time of last write to the file. I think we basically need
> to handle partially written hint files. Also, the CRC file does not exist in
> this case (probably because it crashed while writing the hints file). May be
> ignoring and cleaning up such partially written hint files can be a way to
> fix this?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)