[ https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106507#comment-14106507 ]
Juan Yu commented on HDFS-6908: ------------------------------- [~jingzhao]] Thanks for the new unit test and explain the difference. I assumed when deleting a directory recursively, all children will be added to the diff list. but that's not how the implementation is done. snapshot diff only record directory deletion. so the fix you suggested is better. One more question, I think what's really needed is to call {{cleanSubtreeRecursively}} before {{destroyCreatedList}}, isn't it? {code} + counts.add(currentINode.cleanSubtreeRecursively(snapshot, prior, + collectedBlocks, removedINodes, priorDeleted, countDiffChange)); // delete everything in created list DirectoryDiff lastDiff = diffs.getLast(); if (lastDiff != null) { counts.add(lastDiff.diff.destroyCreatedList(currentINode, collectedBlocks, removedINodes)); } } else { // update prior prior = getDiffs().updatePrior(snapshot, prior); @@ -739,7 +741,10 @@ boolean computeDiffBetweenSnapshots(Snapshot fromSnapshot, counts.add(getDiffs().deleteSnapshotDiff(snapshot, prior, currentINode, collectedBlocks, removedINodes, countDiffChange)); - + + counts.add(currentINode.cleanSubtreeRecursively(snapshot, prior, + collectedBlocks, removedINodes, priorDeleted, countDiffChange)); + // check priorDiff again since it may be created during the diff deletion if (prior != Snapshot.NO_SNAPSHOT_ID) { DirectoryDiff priorDiff = this.getDiffs().getDiffById(prior); @@ -778,9 +783,7 @@ boolean computeDiffBetweenSnapshots(Snapshot fromSnapshot, } } } - counts.add(currentINode.cleanSubtreeRecursively(snapshot, prior, - collectedBlocks, removedINodes, priorDeleted, countDiffChange)); - + if (currentINode.isQuotaSet()) { currentINode.getDirectoryWithQuotaFeature().addSpaceConsumed2Cache( -counts.get(Quota.NAMESPACE), -counts.get(Quota.DISKSPACE)); {code} > incorrect snapshot directory diff generated by snapshot deletion > ---------------------------------------------------------------- > > Key: HDFS-6908 > URL: https://issues.apache.org/jira/browse/HDFS-6908 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots > Reporter: Juan Yu > Assignee: Juan Yu > Priority: Critical > Attachments: HDFS-6908.001.patch > > > In the following scenario, delete snapshot could generate incorrect snapshot > directory diff and corrupted fsimage, if you restart NN after that, you will > get NullPointerException. > 1. create a directory and create a file under it > 2. take a snapshot > 3. create another file under that directory > 4. take second snapshot > 5. delete both files and the directory > 6. delete second snapshot > incorrect directory diff will be generated. > Restart NN will throw NPE > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)