[ 
https://issues.apache.org/jira/browse/HDFS-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-1378:
------------------------------

    Attachment: hdfs-1378-branch20.txt

Here's a patch for branch-20, not for commit.

In trunk the code has been refactored a bit so that the edit log loading code 
directly gets a DataInputStream, so we can't do it quite the same way. I'd like 
to change EditLogInputStream to just return an InputStream rather than 
DataInputStream so that we can wrap it in a position tracker as done in this 
patch.

Here's example output from an edit log that got corrupted due to the root disk 
running out of space:
{noformat}
10/09/06 11:02:30 ERROR common.Storage: Error replaying edit log at offset 
1698779
10/09/06 11:02:30 ERROR common.Storage: Last 4 opcodes at offsets: 1629141 
1629329 1629546 1698775
10/09/06 11:02:30 ERROR namenode.FSNamesystem: FSNamesystem initialization 
failed.
java.io.IOException: Incorrect data format. logVersion is -18 but 
writables.length is 0. 
{noformat}
>From here it's very easy to use {{bvi}} to figure out where truncation or 
>corruption occurred and fix it up.

> Edit log replay should track and report file offsets in case of errors
> ----------------------------------------------------------------------
>
>                 Key: HDFS-1378
>                 URL: https://issues.apache.org/jira/browse/HDFS-1378
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-1378-branch20.txt
>
>
> Occasionally there are bugs or operational mistakes that result in corrupt 
> edit logs which I end up having to repair by hand. In these cases it would be 
> very handy to have the error message also print out the file offsets of the 
> last several edit log opcodes so it's easier to find the right place to edit 
> in the OP_INVALID marker. We could also use this facility to provide a rough 
> estimate of how far along edit log replay the NN is during startup (handy 
> when a 2NN has died and replay takes a while)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to