[ https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106272#comment-14106272 ]
Jing Zhao commented on HDFS-6908: --------------------------------- Thanks for the response, [~j...@cloudera.com]. bq. so there are create/delete pair operations for those files. The challenge here is that we cannot guarantee we always have the create/delete pair here. Imagine the deletion happens on the directory while the creation happens on a file under the directory. Then we cannot depend on the snapshot diff combination to clean the file. The following unit test (based on your original test case) demos the scenario (but with your patch the following test will fail before the leaking check): {code} @Test (timeout=60000) public void testDeleteSnapshot() throws Exception { final Path root = new Path("/"); Path dir = new Path("/dir1"); Path file1 = new Path(dir, "file1"); DFSTestUtil.createFile(hdfs, file1, BLOCKSIZE, REPLICATION, seed); hdfs.allowSnapshot(root); hdfs.createSnapshot(root, "s1"); Path file2 = new Path(dir, "file2"); DFSTestUtil.createFile(hdfs, file2, BLOCKSIZE, REPLICATION, seed); INodeFile file2Node = fsdir.getINode(file2.toString()).asFile(); long file2NodeId = file2Node.getId(); hdfs.createSnapshot(root, "s2"); // delete directory assertTrue(hdfs.delete(dir, true)); assertNotNull(fsdir.getInode(file2NodeId)); // delete second snapshot hdfs.deleteSnapshot(root, "s2"); assertTrue(fsdir.getInode(file2NodeId) == null); NameNodeAdapter.enterSafeMode(cluster.getNameNode(), false); NameNodeAdapter.saveNamespace(cluster.getNameNode()); // restart NN cluster.restartNameNodes(); } {code} > incorrect snapshot directory diff generated by snapshot deletion > ---------------------------------------------------------------- > > Key: HDFS-6908 > URL: https://issues.apache.org/jira/browse/HDFS-6908 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots > Reporter: Juan Yu > Assignee: Juan Yu > Priority: Critical > Attachments: HDFS-6908.001.patch > > > In the following scenario, delete snapshot could generate incorrect snapshot > directory diff and corrupted fsimage, if you restart NN after that, you will > get NullPointerException. > 1. create a directory and create a file under it > 2. take a snapshot > 3. create another file under that directory > 4. take second snapshot > 5. delete both files and the directory > 6. delete second snapshot > incorrect directory diff will be generated. > Restart NN will throw NPE > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)