[ 
https://issues.apache.org/jira/browse/HDFS-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-4423:
--------------------------------

    Attachment: HDFS-4423-branch-1.1.patch

I'm uploading a patch that's similar to the original suggestion from 
[~chenfolin].

When I tried the approach I suggested in my last comment, it didn't work.  The 
reason is that {{SecondaryNameNode}} calls directly in to 
{{FSImage#loadFSEdits}} and depends on that method to call 
{{FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota()}} by side 
effect.  It's a less impactful change to add the call in the if block for when 
the image is already current.

I've also added a test that simulates the error condition by running a cluster 
with separate directories for image and edits, forcing the fstime file for 
edits to contain 0, and then going through a series of restarts/checkpoints to 
make sure that it can still load the merged image.  Before I applied the change 
in {{FSImage}}, this test would fail with {{EOFException}} on the last restart, 
similar to what is described in the bug report.  After I applied the fix in 
{{FSImage}}, the test passed.

                
> Checkpoint exception causes fatal damage to fsimage.
> ----------------------------------------------------
>
>                 Key: HDFS-4423
>                 URL: https://issues.apache.org/jira/browse/HDFS-4423
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 1.0.4, 1.1.1
>         Environment: CentOS 6.2
>            Reporter: ChenFolin
>            Priority: Blocker
>         Attachments: HDFS-4423-branch-1.1.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> The impact of class is org.apache.hadoop.hdfs.server.namenode.FSImage.java
> {code}
> boolean loadFSImage(MetaRecoveryContext recovery) throws IOException {
> ...
> latestNameSD.read();
>     needToSave |= loadFSImage(getImageFile(latestNameSD, NameNodeFile.IMAGE));
>     LOG.info("Image file of size " + imageSize + " loaded in " 
>         + (FSNamesystem.now() - startTime)/1000 + " seconds.");
>     
>     // Load latest edits
>     if (latestNameCheckpointTime > latestEditsCheckpointTime)
>       // the image is already current, discard edits
>       needToSave |= true;
>     else // latestNameCheckpointTime == latestEditsCheckpointTime
>       needToSave |= (loadFSEdits(latestEditsSD, recovery) > 0);
>     
>     return needToSave;
>   }
> {code}
> If it is the normal flow of the checkpoint,the value of 
> latestNameCheckpointTime  is equal to the value of 
> latestEditsCheckpointTime,and it will exec “else”.
> The problem is that,latestNameCheckpointTime > latestEditsCheckpointTime:
> SecondNameNode starts checkpoint,
> ...
> NameNode:rollFSImage,NameNode shutdown after write latestNameCheckpointTime 
> and before write latestEditsCheckpointTime.
> Start NameNode:because latestNameCheckpointTime > 
> latestEditsCheckpointTime,so the value of needToSave is true, and it will not 
> update “rootDir”'s nsCount that is the cluster's file number(update exec at 
> loadFSEdits 
> “FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota()”),and then 
> “saveNamespace” will write file number to fsimage whit default value “1”。
> The next time,loadFSImage will fail.
> Maybe,it will work:
> {code}
> boolean loadFSImage(MetaRecoveryContext recovery) throws IOException {
> ...
> latestNameSD.read();
>     needToSave |= loadFSImage(getImageFile(latestNameSD, NameNodeFile.IMAGE));
>     LOG.info("Image file of size " + imageSize + " loaded in " 
>         + (FSNamesystem.now() - startTime)/1000 + " seconds.");
>     
>     // Load latest edits
>     if (latestNameCheckpointTime > latestEditsCheckpointTime){
>       // the image is already current, discard edits
>       needToSave |= true;
>       FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota();
>     }
>     else // latestNameCheckpointTime == latestEditsCheckpointTime
>       needToSave |= (loadFSEdits(latestEditsSD, recovery) > 0);
>     
>     return needToSave;
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to