[
https://issues.apache.org/jira/browse/HDFS-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757437#comment-13757437
]
Aaron T. Myers commented on HDFS-5159:
--------------------------------------
Example error in the logs if a 2NN experiences this:
{noformat}
2013-08-23 00:10:30,849 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
doCheckpoint
java.io.IOException: There appears to be a gap in the edit log. We expected
txid 1, but got txid 1164889243.
at
org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:159)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:715)
at
org.apache.hadoop.hdfs.server.namenode.Checkpointer.rollForwardByApplyingLogs(Checkpointer.java:296)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:898)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:485)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:343)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:310)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:452)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:306)
at java.lang.Thread.run(Thread.java:662)
{noformat}
> Secondary NameNode fails to checkpoint if error occurs downloading edits on
> first checkpoint
> --------------------------------------------------------------------------------------------
>
> Key: HDFS-5159
> URL: https://issues.apache.org/jira/browse/HDFS-5159
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.1.0-beta
> Reporter: Aaron T. Myers
> Assignee: Aaron T. Myers
>
> The 2NN will avoid downloading/loading a new fsimage if its local copy of
> fsimage is the same as the version on the NN. However, the decision to *load*
> the fsimage from disk into memory is based only on the on-disk fsimage
> version. If an error occurs between downloading and loading the fsimage on
> the first checkpoint attempt, the 2NN will never load the fsimage, and then
> on subsequent checkpoint attempts it will not load the on-disk fsimage and
> thus will never checkpoint successfully.
> Example error message in the first comment of this ticket.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira