[
https://issues.apache.org/jira/browse/HDFS-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758408#comment-13758408
]
Jing Zhao commented on HDFS-5159:
---------------------------------
The patch looks pretty good to me. I've also verified that without the fix the
new unit test will fail. The only nit is that maybe it's better to avoid
calling cluster#getNamesystem three times here:
{code}
+ cluster.getNamesystem().enterSafeMode(false);
+ cluster.getNamesystem().saveNamespace();
+ cluster.getNamesystem().leaveSafeMode();
{code}
+1 after this is addressed.
> Secondary NameNode fails to checkpoint if error occurs downloading edits on
> first checkpoint
> --------------------------------------------------------------------------------------------
>
> Key: HDFS-5159
> URL: https://issues.apache.org/jira/browse/HDFS-5159
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.1.0-beta
> Reporter: Aaron T. Myers
> Assignee: Aaron T. Myers
> Attachments: HDFS-5159.patch
>
>
> The 2NN will avoid downloading/loading a new fsimage if its local copy of
> fsimage is the same as the version on the NN. However, the decision to *load*
> the fsimage from disk into memory is based only on the on-disk fsimage
> version. If an error occurs between downloading and loading the fsimage on
> the first checkpoint attempt, the 2NN will never load the fsimage, and then
> on subsequent checkpoint attempts it will not load the on-disk fsimage and
> thus will never checkpoint successfully.
> Example error message in the first comment of this ticket.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira