[
https://issues.apache.org/jira/browse/HDFS-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036702#comment-14036702
]
Aaron T. Myers commented on HDFS-6563:
--------------------------------------
I've filed this as critical for now, but if folks think this should be a
blocker I'm fine raising the priority.
Though the issue is fairly critical, the bug is fairly straightforward. In
{{FSImageFormatPBINode#save(OutputStream, INodeFile)}} we have the following
code:
{code}
for (Block block : n.getBlocks()) {
b.addBlocks(PBHelper.convert(block));
}
{code}
Perhaps not obviously, this assumes that {{n.getBlocks()}} will never return
{{null}}. However, this is possible in the above-described scenario because of
this code in {{FileWithSnapshotFeature#collectBlocksBeyondMax}}:
{code}
final BlockInfo[] newBlocks;
if (n == 0) {
newBlocks = null;
} else {
newBlocks = new BlockInfo[n];
System.arraycopy(oldBlocks, 0, newBlocks, 0, n);
}
// set new blocks
file.setBlocks(newBlocks);
{code}
When attempting to save an fsimage after this code has been run, errors like
the following will appear in the logs:
{noformat}
2014-06-18 16:55:11,295 ERROR namenode.FSImage (FSImage.java:run(988)) - Unable
to save image for
/home/atm/src/apache/hadoop.git/src/hadoop-hdfs-project/hadoop-hdfs/build/test/data/dfs/name1
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.save(FSImageFormatPBINode.java:537)
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.save(FSImageFormatPBINode.java:518)
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.serializeINodeSection(FSImageFormatPBINode.java:491)
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.saveInodes(FSImageFormatProtobuf.java:412)
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.saveInternal(FSImageFormatProtobuf.java:457)
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.save(FSImageFormatProtobuf.java:393)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:931)
at
org.apache.hadoop.hdfs.server.namenode.FSImage$FSImageSaver.run(FSImage.java:982)
at java.lang.Thread.run(Thread.java:724)
2014-06-18 16:55:11,295 ERROR namenode.FSImage (FSImage.java:run(988)) - Unable
to save image for
/home/atm/src/apache/hadoop.git/src/hadoop-hdfs-project/hadoop-hdfs/build/test/data/dfs/name2
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.save(FSImageFormatPBINode.java:537)
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.save(FSImageFormatPBINode.java:518)
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.serializeINodeSection(FSImageFormatPBINode.java:491)
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.saveInodes(FSImageFormatProtobuf.java:412)
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.saveInternal(FSImageFormatProtobuf.java:457)
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Saver.save(FSImageFormatProtobuf.java:393)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:931)
at
org.apache.hadoop.hdfs.server.namenode.FSImage$FSImageSaver.run(FSImage.java:982)
at java.lang.Thread.run(Thread.java:724)
2014-06-18 16:55:11,297 ERROR common.Storage
(NNStorage.java:reportErrorsOnDirectory(808)) - Error reported on storage
directory Storage Directory
/home/atm/src/apache/hadoop.git/src/hadoop-hdfs-project/hadoop-hdfs/build/test/data/dfs/name1
2014-06-18 16:55:11,297 WARN common.Storage
(NNStorage.java:reportErrorsOnDirectory(813)) - About to remove corresponding
storage:
/home/atm/src/apache/hadoop.git/src/hadoop-hdfs-project/hadoop-hdfs/build/test/data/dfs/name1
2014-06-18 16:55:11,297 ERROR common.Storage
(NNStorage.java:reportErrorsOnDirectory(808)) - Error reported on storage
directory Storage Directory
/home/atm/src/apache/hadoop.git/src/hadoop-hdfs-project/hadoop-hdfs/build/test/data/dfs/name2
2014-06-18 16:55:11,297 WARN common.Storage
(NNStorage.java:reportErrorsOnDirectory(813)) - About to remove corresponding
storage:
/home/atm/src/apache/hadoop.git/src/hadoop-hdfs-project/hadoop-hdfs/build/test/data/dfs/name2
{noformat}
> NameNode cannot save fsimage in certain circumstances when snapshots are in
> use
> -------------------------------------------------------------------------------
>
> Key: HDFS-6563
> URL: https://issues.apache.org/jira/browse/HDFS-6563
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode, snapshots
> Affects Versions: 2.4.0
> Reporter: Aaron T. Myers
> Assignee: Aaron T. Myers
> Priority: Critical
>
> Checkpoints will start to fail and the NameNode will not be able to manually
> saveNamespace if the following set of steps occurs:
> # A zero-length file appears in a snapshot
> # That file is later lengthened to include at least one block
> # That file is subsequently deleted from the present file system but remains
> in the snapshot
> More details in the first comment.
--
This message was sent by Atlassian JIRA
(v6.2#6252)