[jira] [Commented] (ACCUMULO-716) Corrupt WAL file

Eric Newton (JIRA) Wed, 09 Jan 2013 11:02:13 -0800

    [ 
https://issues.apache.org/jira/browse/ACCUMULO-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13548829#comment-13548829
 ]


Eric Newton commented on ACCUMULO-716:
--------------------------------------

So, let's say I write out 50 bytes to the WAL.  That's not enough to have a 
checksum, yet.  I sync it to disk, so it's a clean write.  Then I write another 
50K, which attempts to write a checksum, but I get a full disk error. The file 
is not sync'd and the client is never told that the 2nd set of mutations were 
saved.  But now we have a WAL which contains some good mutations which we need 
to recover, and a checksum error near the end of the file.  Unfortunately, we 
just blowout with an error, and we do not recover the 50 bytes.

Fortunately, it looks like you can recover the log if you make a copy of it and 
move it into place.


                
> Corrupt WAL file
> ----------------
>
>                 Key: ACCUMULO-716
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-716
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>         Environment: java version "1.6.0_33", hadoop-0.20.2-cdh3u3
>            Reporter: Josh Elser
>            Assignee: Eric Newton
>
> Ran wikisearch-ingest. Ended up filling up a drive used by HDFS and things 
> failed not-so-gracefully. Upon restart, log recovery started, appeared to 
> finish (failed HDFS checksum on one WAL entry), and left Accumulo in a state 
> where no tablets were assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-716) Corrupt WAL file

Reply via email to