[
https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106272#comment-14106272
]
Jing Zhao edited comment on HDFS-6908 at 8/22/14 1:03 AM:
----------------------------------------------------------
Thanks for the response, [[email protected]].
bq. so there are create/delete pair operations for those files.
The challenge here is that we cannot guarantee we always have the create/delete
pair. Imagine the deletion happens on the directory while the creation happens
on a file under the directory. Then we cannot depend on the snapshot diff
combination to clean the file. The following unit test (based on your original
test case) demos the scenario (but with your patch the following test will hit
another exception before the leaking check):
{code}
@Test (timeout=60000)
public void testDeleteSnapshot() throws Exception {
final Path root = new Path("/");
Path dir = new Path("/dir1");
Path file1 = new Path(dir, "file1");
DFSTestUtil.createFile(hdfs, file1, BLOCKSIZE, REPLICATION, seed);
hdfs.allowSnapshot(root);
hdfs.createSnapshot(root, "s1");
Path file2 = new Path(dir, "file2");
DFSTestUtil.createFile(hdfs, file2, BLOCKSIZE, REPLICATION, seed);
INodeFile file2Node = fsdir.getINode(file2.toString()).asFile();
long file2NodeId = file2Node.getId();
hdfs.createSnapshot(root, "s2");
// delete directory
assertTrue(hdfs.delete(dir, true));
assertNotNull(fsdir.getInode(file2NodeId));
// delete second snapshot
hdfs.deleteSnapshot(root, "s2");
assertTrue(fsdir.getInode(file2NodeId) == null);
NameNodeAdapter.enterSafeMode(cluster.getNameNode(), false);
NameNodeAdapter.saveNamespace(cluster.getNameNode());
// restart NN
cluster.restartNameNodes();
}
{code}
was (Author: jingzhao):
Thanks for the response, [[email protected]].
bq. so there are create/delete pair operations for those files.
The challenge here is that we cannot guarantee we always have the create/delete
pair here. Imagine the deletion happens on the directory while the creation
happens on a file under the directory. Then we cannot depend on the snapshot
diff combination to clean the file. The following unit test (based on your
original test case) demos the scenario (but with your patch the following test
will fail before the leaking check):
{code}
@Test (timeout=60000)
public void testDeleteSnapshot() throws Exception {
final Path root = new Path("/");
Path dir = new Path("/dir1");
Path file1 = new Path(dir, "file1");
DFSTestUtil.createFile(hdfs, file1, BLOCKSIZE, REPLICATION, seed);
hdfs.allowSnapshot(root);
hdfs.createSnapshot(root, "s1");
Path file2 = new Path(dir, "file2");
DFSTestUtil.createFile(hdfs, file2, BLOCKSIZE, REPLICATION, seed);
INodeFile file2Node = fsdir.getINode(file2.toString()).asFile();
long file2NodeId = file2Node.getId();
hdfs.createSnapshot(root, "s2");
// delete directory
assertTrue(hdfs.delete(dir, true));
assertNotNull(fsdir.getInode(file2NodeId));
// delete second snapshot
hdfs.deleteSnapshot(root, "s2");
assertTrue(fsdir.getInode(file2NodeId) == null);
NameNodeAdapter.enterSafeMode(cluster.getNameNode(), false);
NameNodeAdapter.saveNamespace(cluster.getNameNode());
// restart NN
cluster.restartNameNodes();
}
{code}
> incorrect snapshot directory diff generated by snapshot deletion
> ----------------------------------------------------------------
>
> Key: HDFS-6908
> URL: https://issues.apache.org/jira/browse/HDFS-6908
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: snapshots
> Reporter: Juan Yu
> Assignee: Juan Yu
> Priority: Critical
> Attachments: HDFS-6908.001.patch
>
>
> In the following scenario, delete snapshot could generate incorrect snapshot
> directory diff and corrupted fsimage, if you restart NN after that, you will
> get NullPointerException.
> 1. create a directory and create a file under it
> 2. take a snapshot
> 3. create another file under that directory
> 4. take second snapshot
> 5. delete both files and the directory
> 6. delete second snapshot
> incorrect directory diff will be generated.
> Restart NN will throw NPE
> {code}
> java.lang.NullPointerException
> at
> org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246)
> at
> org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265)
> at
> org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328)
> at
> org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554)
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)