[
https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Juan Yu updated HDFS-6908:
--------------------------
Attachment: HDFS-6908.001.patch
The problem is when deleting snapshot, hdfs will clean Inode that not in any
snapshot any more. however, the logic is a bit wrong.
if an inode is directory, it calls destroyCreatedList() to put created children
inodes(created between prior snapshot and the deleting one) to removedINodes
list, but also clear the createdList. this breaks the create/delete pair
operation. so later, when combine the diff with prior diff, the delete
operation stays.
I think instead of calling destroyCreatedList(), it should just do
dir.removeChild(c), and let the file diff does cleanup for files.
> incorrect snapshot directory diff generated by snapshot deletion
> ----------------------------------------------------------------
>
> Key: HDFS-6908
> URL: https://issues.apache.org/jira/browse/HDFS-6908
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Juan Yu
> Assignee: Juan Yu
> Attachments: HDFS-6908.001.patch
>
>
> In the following scenario, delete snapshot could generate incorrect snapshot
> directory diff and corrupted fsimage, if you restart NN after that, you will
> get NullPointerException.
> 1. create a directory and create a file under it
> 2. take a snapshot
> 3. create another file under that directory
> 4. take second snapshot
> 5. delete both files and the directory
> 6. delete second snapshot
> incorrect directory diff will be generated.
> Restart NN will throw NPE
> {code}
> java.lang.NullPointerException
> at
> org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246)
> at
> org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265)
> at
> org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328)
> at
> org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554)
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)