[ https://issues.apache.org/jira/browse/HDFS-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122658#comment-15122658 ]
Yongjun Zhang commented on HDFS-9697: ------------------------------------- Hi [~jingzhao], Agree with you. Weird thing, I tried 7f46636495e23693d588b0915f464fa7afd9102e which is the latest trunk tip and still can't not reproduce the exception stack you and [~vinayrpet] were able to see. May I know the tip commit of your build? Thanks. > NN fails to restart due to corrupt fsimage caused by snapshot handling > ---------------------------------------------------------------------- > > Key: HDFS-9697 > URL: https://issues.apache.org/jira/browse/HDFS-9697 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Reporter: Yongjun Zhang > Assignee: Yongjun Zhang > > This is related to HDFS-9406, but not quite the same symptom. > {quote} > ERROR namenode.NameNode: Failed to start namenode. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadINodeReference(FSImageFormatPBSnapshot.java:114) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadINodeReferenceSection(FSImageFormatPBSnapshot.java:105) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:258) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:929) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:913) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:732) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:668) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1062) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:766) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:589) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:646) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:818) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:797) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1561) > {quote} > A sequence that I found can reproduce the exception stack is: > {code} > hadoop fs -mkdir /st > hadoop fs -mkdir /st/y > hadoop fs -mkdir /nonst > hadoop fs -mkdir /nonst/trash > hdfs dfsadmin -allowSnapshot /st > hdfs dfs -createSnapshot /st s0 > hadoop fs -touchz /st/y/nn.log > hdfs dfs -createSnapshot /st s1 > hadoop fs -mv /st/y/nn.log /st/y/nn1.log > hdfs dfs -createSnapshot /st s2 > hadoop fs -mkdir /nonst/trash/st > hadoop fs -mv /st/y /nonst/trash/st > hadoop fs -rmr /nonst/trash > hdfs dfs -deleteSnapshot /st s1 > hdfs dfs -deleteSnapshot /st s2 > hdfs dfsadmin -safemode enter > hdfs dfsadmin -saveNamespace > hdfs dfsadmin -safemode leave > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)