[ 
https://issues.apache.org/jira/browse/HDFS-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5982:
----------------------------

    Description: Currently after deleting a snapshottable directory which does 
not have snapshots any more, we also remove the directory from the 
snapshottable directory list in SnapshotManager. This works fine when handling 
a delete request from user. However, when we apply the OP_DELETE editlog, 
FSDirectory#unprotectedDelete(String, long) is called, which does not contain 
the "updating snapshot manager" process. This may leave an non-existent inode 
id in the snapshottable directory list, and can even lead to FSImage 
corruption.  (was: I stopped, started Namenode with no operations and hit into 
an issue when NN cannot start up again. There is NPE in NN logs.
 
{code}
2014-02-19 01:59:04,616 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: 
Exception in namenode join
java.lang.NullPointerException
        at 
org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotFSImageFormat.loadFileDiff(SnapshotFSImageFormat.java:131)
        at 
org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotFSImageFormat.loadFileDiffList(SnapshotFSImageFormat.java:111)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadINode(FSImageFormat.java:688)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadINodeWithLocalName(FSImageFormat.java:636)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadChildren(FSImageFormat.java:468)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:510)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:519)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:519)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:519)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:519)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadLocalNameINodesWithSnapshot(FSImageFormat.java:412)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:350)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:832)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:821)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:669)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:638)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:265)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:856)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:616)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:434)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:490)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:646)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:631)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1270)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1336)
2014-02-19 01:59:04,619 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status 1
{code})

> Restarting Namenode can have NPE.
> ---------------------------------
>
>                 Key: HDFS-5982
>                 URL: https://issues.apache.org/jira/browse/HDFS-5982
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.3.0
>            Reporter: Tassapol Athiapinya
>            Assignee: Jing Zhao
>            Priority: Critical
>             Fix For: 2.3.0
>
>
> Currently after deleting a snapshottable directory which does not have 
> snapshots any more, we also remove the directory from the snapshottable 
> directory list in SnapshotManager. This works fine when handling a delete 
> request from user. However, when we apply the OP_DELETE editlog, 
> FSDirectory#unprotectedDelete(String, long) is called, which does not contain 
> the "updating snapshot manager" process. This may leave an non-existent inode 
> id in the snapshottable directory list, and can even lead to FSImage 
> corruption.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to