[ 
https://issues.apache.org/jira/browse/HDFS-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036702#comment-14036702
 ] 

Aaron T. Myers commented on HDFS-6563:
--------------------------------------

I've filed this as critical for now, but if folks think this should be a 
blocker I'm fine raising the priority.

Though the issue is fairly critical, the bug is fairly straightforward. In 
{{FSImageFormatPBINode#save(OutputStream, INodeFile)}} we have the following 
code:

{code}
        for (Block block : n.getBlocks()) {
          b.addBlocks(PBHelper.convert(block));
        }
{code}

Perhaps not obviously, this assumes that {{n.getBlocks()}} will never return 
{{null}}. However, this is possible in the above-described scenario because of 
this code in {{FileWithSnapshotFeature#collectBlocksBeyondMax}}:

{code}
        final BlockInfo[] newBlocks;
        if (n == 0) {
          newBlocks = null;
        } else {
          newBlocks = new BlockInfo[n];
          System.arraycopy(oldBlocks, 0, newBlocks, 0, n);
        }
        
        // set new blocks
        file.setBlocks(newBlocks);
{code}

When attempting to save an fsimage after this code has been run, errors like 
the following will appear in the logs:

{noformat}
2014-06-18 16:55:11,295 ERROR namenode.FSImage (FSImage.java:run(988)) - Unable 
to save image for 
/home/atm/src/apache/hadoop.git/src/hadoop-hdfs-project/hadoop-hdfs/build/test/data/dfs/name1
java.lang.NullPointerException
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.save(FSImageFormatPBINode.java:537)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.save(FSImageFormatPBINode.java:518)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.serializeINodeSection(FSImageFormatPBINode.java:491)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.saveInodes(FSImageFormatProtobuf.java:412)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.saveInternal(FSImageFormatProtobuf.java:457)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.save(FSImageFormatProtobuf.java:393)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:931)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage$FSImageSaver.run(FSImage.java:982)
        at java.lang.Thread.run(Thread.java:724)
2014-06-18 16:55:11,295 ERROR namenode.FSImage (FSImage.java:run(988)) - Unable 
to save image for 
/home/atm/src/apache/hadoop.git/src/hadoop-hdfs-project/hadoop-hdfs/build/test/data/dfs/name2
java.lang.NullPointerException
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.save(FSImageFormatPBINode.java:537)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.save(FSImageFormatPBINode.java:518)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.serializeINodeSection(FSImageFormatPBINode.java:491)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.saveInodes(FSImageFormatProtobuf.java:412)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.saveInternal(FSImageFormatProtobuf.java:457)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.save(FSImageFormatProtobuf.java:393)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:931)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage$FSImageSaver.run(FSImage.java:982)
        at java.lang.Thread.run(Thread.java:724)
2014-06-18 16:55:11,297 ERROR common.Storage 
(NNStorage.java:reportErrorsOnDirectory(808)) - Error reported on storage 
directory Storage Directory 
/home/atm/src/apache/hadoop.git/src/hadoop-hdfs-project/hadoop-hdfs/build/test/data/dfs/name1
2014-06-18 16:55:11,297 WARN  common.Storage 
(NNStorage.java:reportErrorsOnDirectory(813)) - About to remove corresponding 
storage: 
/home/atm/src/apache/hadoop.git/src/hadoop-hdfs-project/hadoop-hdfs/build/test/data/dfs/name1
2014-06-18 16:55:11,297 ERROR common.Storage 
(NNStorage.java:reportErrorsOnDirectory(808)) - Error reported on storage 
directory Storage Directory 
/home/atm/src/apache/hadoop.git/src/hadoop-hdfs-project/hadoop-hdfs/build/test/data/dfs/name2
2014-06-18 16:55:11,297 WARN  common.Storage 
(NNStorage.java:reportErrorsOnDirectory(813)) - About to remove corresponding 
storage: 
/home/atm/src/apache/hadoop.git/src/hadoop-hdfs-project/hadoop-hdfs/build/test/data/dfs/name2
{noformat}

> NameNode cannot save fsimage in certain circumstances when snapshots are in 
> use
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-6563
>                 URL: https://issues.apache.org/jira/browse/HDFS-6563
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, snapshots
>    Affects Versions: 2.4.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>            Priority: Critical
>
> Checkpoints will start to fail and the NameNode will not be able to manually 
> saveNamespace if the following set of steps occurs:
> # A zero-length file appears in a snapshot
> # That file is later lengthened to include at least one block
> # That file is subsequently deleted from the present file system but remains 
> in the snapshot
> More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to