[ 
https://issues.apache.org/jira/browse/HDFS-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016170#comment-13016170
 ] 

Aaron T. Myers commented on HDFS-1378:
--------------------------------------

Patch looks pretty solid, Todd, and very helpful. One comment:

There are large classes of edits log corruptions which will result in some 
exception which is not an IOE being thrown. But, this debugging info is only 
printed in the event an IOE is thrown. I've twice now had to change this code 
to catch NPE and recompile to get it to print this info. Ideally I think we'd 
change things so that this stuff is in a "{{catch (Throwable t)}}" block, with 
the actual exception being re-thrown after printing.

> Edit log replay should track and report file offsets in case of errors
> ----------------------------------------------------------------------
>
>                 Key: HDFS-1378
>                 URL: https://issues.apache.org/jira/browse/HDFS-1378
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-1378-branch20.txt
>
>
> Occasionally there are bugs or operational mistakes that result in corrupt 
> edit logs which I end up having to repair by hand. In these cases it would be 
> very handy to have the error message also print out the file offsets of the 
> last several edit log opcodes so it's easier to find the right place to edit 
> in the OP_INVALID marker. We could also use this facility to provide a rough 
> estimate of how far along edit log replay the NN is during startup (handy 
> when a 2NN has died and replay takes a while)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to