[
https://issues.apache.org/jira/browse/HDFS-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jing Zhao updated HDFS-5982:
----------------------------
Description: Currently after deleting a snapshottable directory which does
not have snapshots any more, we also remove the directory from the
snapshottable directory list in SnapshotManager. This works fine when handling
a delete request from user. However, when we apply the OP_DELETE editlog,
FSDirectory#unprotectedDelete(String, long) is called, which does not contain
the "updating snapshot manager" process. This may leave an non-existent inode
id in the snapshottable directory list, and can even lead to FSImage
corruption. (was: I stopped, started Namenode with no operations and hit into
an issue when NN cannot start up again. There is NPE in NN logs.
{code}
2014-02-19 01:59:04,616 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode:
Exception in namenode join
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotFSImageFormat.loadFileDiff(SnapshotFSImageFormat.java:131)
at
org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotFSImageFormat.loadFileDiffList(SnapshotFSImageFormat.java:111)
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadINode(FSImageFormat.java:688)
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadINodeWithLocalName(FSImageFormat.java:636)
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadChildren(FSImageFormat.java:468)
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:510)
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:519)
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:519)
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:519)
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:519)
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadLocalNameINodesWithSnapshot(FSImageFormat.java:412)
at
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:350)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:832)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:821)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:669)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:638)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:265)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:856)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:616)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:434)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:490)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:646)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:631)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1270)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1336)
2014-02-19 01:59:04,619 INFO org.apache.hadoop.util.ExitUtil: Exiting with
status 1
{code})
> Restarting Namenode can have NPE.
> ---------------------------------
>
> Key: HDFS-5982
> URL: https://issues.apache.org/jira/browse/HDFS-5982
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.3.0
> Reporter: Tassapol Athiapinya
> Assignee: Jing Zhao
> Priority: Critical
> Fix For: 2.3.0
>
>
> Currently after deleting a snapshottable directory which does not have
> snapshots any more, we also remove the directory from the snapshottable
> directory list in SnapshotManager. This works fine when handling a delete
> request from user. However, when we apply the OP_DELETE editlog,
> FSDirectory#unprotectedDelete(String, long) is called, which does not contain
> the "updating snapshot manager" process. This may leave an non-existent inode
> id in the snapshottable directory list, and can even lead to FSImage
> corruption.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)