[ https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106231#comment-14106231 ]
Juan Yu commented on HDFS-6908: ------------------------------- Thanks [~jingzhao]. because the directory is deleted, it means the file created between prior snapshot and the deleting one must be deleted as well. so there are create/delete pair operations for those files. the file diff processing part will add the file to removedINodes list. when I debug the fix, I saw the inode for the file are deleted correctly, no leak. and the intermediate create/delete file change is cleaned after combining the diff with prior one as well. {code} } else if (topNode.isFile() && topNode.asFile().isWithSnapshot()) { INodeFile file = topNode.asFile(); counts.add(file.getDiffs().deleteSnapshotDiff(post, prior, file, collectedBlocks, removedINodes, countDiffChange)); {code} > incorrect snapshot directory diff generated by snapshot deletion > ---------------------------------------------------------------- > > Key: HDFS-6908 > URL: https://issues.apache.org/jira/browse/HDFS-6908 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots > Reporter: Juan Yu > Assignee: Juan Yu > Priority: Critical > Attachments: HDFS-6908.001.patch > > > In the following scenario, delete snapshot could generate incorrect snapshot > directory diff and corrupted fsimage, if you restart NN after that, you will > get NullPointerException. > 1. create a directory and create a file under it > 2. take a snapshot > 3. create another file under that directory > 4. take second snapshot > 5. delete both files and the directory > 6. delete second snapshot > incorrect directory diff will be generated. > Restart NN will throw NPE > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)