[ 
https://issues.apache.org/jira/browse/HDFS-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757437#comment-13757437
 ] 

Aaron T. Myers commented on HDFS-5159:
--------------------------------------

Example error in the logs if a 2NN experiences this:

{noformat}
2013-08-23 00:10:30,849 ERROR 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in 
doCheckpoint
java.io.IOException: There appears to be a gap in the edit log.  We expected 
txid 1, but got txid 1164889243.
  at 
org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
  at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:159)
  at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:715)
  at 
org.apache.hadoop.hdfs.server.namenode.Checkpointer.rollForwardByApplyingLogs(Checkpointer.java:296)
  at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:898)
  at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:485)
  at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:343)
  at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:310)
  at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:452)
  at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:306)
  at java.lang.Thread.run(Thread.java:662)
{noformat}
                
> Secondary NameNode fails to checkpoint if error occurs downloading edits on 
> first checkpoint
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5159
>                 URL: https://issues.apache.org/jira/browse/HDFS-5159
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.1.0-beta
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>
> The 2NN will avoid downloading/loading a new fsimage if its local copy of 
> fsimage is the same as the version on the NN. However, the decision to *load* 
> the fsimage from disk into memory is based only on the on-disk fsimage 
> version. If an error occurs between downloading and loading the fsimage on 
> the first checkpoint attempt, the 2NN will never load the fsimage, and then 
> on subsequent checkpoint attempts it will not load the on-disk fsimage and 
> thus will never checkpoint successfully.
> Example error message in the first comment of this ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to