[ 
https://issues.apache.org/jira/browse/HDFS-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560118#comment-13560118
 ] 

Chris Nauroth commented on HDFS-4423:
-------------------------------------

Thank you for the detailed write-up, [~chenfolin].  I have one additional 
question.  You mentioned an exception causing {{NameNode}} to shutdown during 
checkpoint after writing latest name checkpoint time, but before writing latest 
edits checkpoint time.  Do you have details on that exception?  Was that 
exception related to this bug, or was it something unrelated that just exposed 
this problem in the {{loadFSImage}} logic?

Your assessment about the call to {{FSDirectory#updateCountForINodeWithQuota}} 
looks correct.  I'm thinking that we should move that call out of 
{{FSImage#loadFSEdits}} and into {{FSImage#loadFSImage}}, so that the end of 
{{loadFSImage}} would look like this:

{code}
boolean loadFSImage(MetaRecoveryContext recovery) throws IOException {
...    
  // Load latest edits
  if (latestNameCheckpointTime > latestEditsCheckpointTime)
    // the image is already current, discard edits
    needToSave |= true;
  else // latestNameCheckpointTime == latestEditsCheckpointTime
    needToSave |= (loadFSEdits(latestEditsSD, recovery) > 0);
    
  // update the counts.
  FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota();    
  return needToSave;
}
{code}

Moving the call there would help guarantee that it always happens.

                
> Checkpoint exception causes fatal damage to fsimage.
> ----------------------------------------------------
>
>                 Key: HDFS-4423
>                 URL: https://issues.apache.org/jira/browse/HDFS-4423
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 1.0.4, 1.1.1
>         Environment: CentOS 6.2
>            Reporter: ChenFolin
>            Priority: Blocker
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> The impact of class is org.apache.hadoop.hdfs.server.namenode.FSImage.java
> {code}
> boolean loadFSImage(MetaRecoveryContext recovery) throws IOException {
> ...
> latestNameSD.read();
>     needToSave |= loadFSImage(getImageFile(latestNameSD, NameNodeFile.IMAGE));
>     LOG.info("Image file of size " + imageSize + " loaded in " 
>         + (FSNamesystem.now() - startTime)/1000 + " seconds.");
>     
>     // Load latest edits
>     if (latestNameCheckpointTime > latestEditsCheckpointTime)
>       // the image is already current, discard edits
>       needToSave |= true;
>     else // latestNameCheckpointTime == latestEditsCheckpointTime
>       needToSave |= (loadFSEdits(latestEditsSD, recovery) > 0);
>     
>     return needToSave;
>   }
> {code}
> If it is the normal flow of the checkpoint,the value of 
> latestNameCheckpointTime  is equal to the value of 
> latestEditsCheckpointTime,and it will exec “else”.
> The problem is that,latestNameCheckpointTime > latestEditsCheckpointTime:
> SecondNameNode starts checkpoint,
> ...
> NameNode:rollFSImage,NameNode shutdown after write latestNameCheckpointTime 
> and before write latestEditsCheckpointTime.
> Start NameNode:because latestNameCheckpointTime > 
> latestEditsCheckpointTime,so the value of needToSave is true, and it will not 
> update “rootDir”'s nsCount that is the cluster's file number(update exec at 
> loadFSEdits 
> “FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota()”),and then 
> “saveNamespace” will write file number to fsimage whit default value “1”。
> The next time,loadFSImage will fail.
> Maybe,it will work:
> {code}
> boolean loadFSImage(MetaRecoveryContext recovery) throws IOException {
> ...
> latestNameSD.read();
>     needToSave |= loadFSImage(getImageFile(latestNameSD, NameNodeFile.IMAGE));
>     LOG.info("Image file of size " + imageSize + " loaded in " 
>         + (FSNamesystem.now() - startTime)/1000 + " seconds.");
>     
>     // Load latest edits
>     if (latestNameCheckpointTime > latestEditsCheckpointTime){
>       // the image is already current, discard edits
>       needToSave |= true;
>       FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota();
>     }
>     else // latestNameCheckpointTime == latestEditsCheckpointTime
>       needToSave |= (loadFSEdits(latestEditsSD, recovery) > 0);
>     
>     return needToSave;
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to