[
https://issues.apache.org/jira/browse/ACCUMULO-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13548829#comment-13548829
]
Eric Newton commented on ACCUMULO-716:
--------------------------------------
So, let's say I write out 50 bytes to the WAL. That's not enough to have a
checksum, yet. I sync it to disk, so it's a clean write. Then I write another
50K, which attempts to write a checksum, but I get a full disk error. The file
is not sync'd and the client is never told that the 2nd set of mutations were
saved. But now we have a WAL which contains some good mutations which we need
to recover, and a checksum error near the end of the file. Unfortunately, we
just blowout with an error, and we do not recover the 50 bytes.
Fortunately, it looks like you can recover the log if you make a copy of it and
move it into place.
> Corrupt WAL file
> ----------------
>
> Key: ACCUMULO-716
> URL: https://issues.apache.org/jira/browse/ACCUMULO-716
> Project: Accumulo
> Issue Type: Bug
> Components: tserver
> Environment: java version "1.6.0_33", hadoop-0.20.2-cdh3u3
> Reporter: Josh Elser
> Assignee: Eric Newton
>
> Ran wikisearch-ingest. Ended up filling up a drive used by HDFS and things
> failed not-so-gracefully. Upon restart, log recovery started, appeared to
> finish (failed HDFS checksum on one WAL entry), and left Accumulo in a state
> where no tablets were assigned.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira