[ 
https://issues.apache.org/jira/browse/HDFS-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106272#comment-14106272
 ] 

Jing Zhao commented on HDFS-6908:
---------------------------------

Thanks for the response, [~j...@cloudera.com].

bq. so there are create/delete pair operations for those files.

The challenge here is that we cannot guarantee we always have the create/delete 
pair here. Imagine the deletion happens on the directory while the creation 
happens on a file under the directory. Then we cannot depend on the snapshot 
diff combination to clean the file. The following unit test (based on your 
original test case) demos the scenario (but with your patch the following test 
will fail before the leaking check):
{code}
  @Test (timeout=60000)
  public void testDeleteSnapshot() throws Exception {
    final Path root = new Path("/");

    Path dir = new Path("/dir1");
    Path file1 = new Path(dir, "file1");
    DFSTestUtil.createFile(hdfs, file1, BLOCKSIZE, REPLICATION, seed);

    hdfs.allowSnapshot(root);
    hdfs.createSnapshot(root, "s1");

    Path file2 = new Path(dir, "file2");
    DFSTestUtil.createFile(hdfs, file2, BLOCKSIZE, REPLICATION, seed);
    INodeFile file2Node = fsdir.getINode(file2.toString()).asFile();
    long file2NodeId = file2Node.getId();

    hdfs.createSnapshot(root, "s2");

    // delete directory
    assertTrue(hdfs.delete(dir, true));
    assertNotNull(fsdir.getInode(file2NodeId));

    // delete second snapshot
    hdfs.deleteSnapshot(root, "s2");
    assertTrue(fsdir.getInode(file2NodeId) == null);

    NameNodeAdapter.enterSafeMode(cluster.getNameNode(), false);
    NameNodeAdapter.saveNamespace(cluster.getNameNode());

    // restart NN
    cluster.restartNameNodes();
  }
{code}


> incorrect snapshot directory diff generated by snapshot deletion
> ----------------------------------------------------------------
>
>                 Key: HDFS-6908
>                 URL: https://issues.apache.org/jira/browse/HDFS-6908
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: snapshots
>            Reporter: Juan Yu
>            Assignee: Juan Yu
>            Priority: Critical
>         Attachments: HDFS-6908.001.patch
>
>
> In the following scenario, delete snapshot could generate incorrect snapshot 
> directory diff and corrupted fsimage, if you restart NN after that, you will 
> get NullPointerException.
> 1. create a directory and create a file under it
> 2. take a snapshot
> 3. create another file under that directory
> 4. take second snapshot
> 5. delete both files and the directory
> 6. delete second snapshot
> incorrect directory diff will be generated.
> Restart NN will throw NPE
> {code}
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.addToDeletedList(FSImageFormatPBSnapshot.java:246)
>       at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDeletedList(FSImageFormatPBSnapshot.java:265)
>       at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadDirectoryDiffList(FSImageFormatPBSnapshot.java:328)
>       at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FSImageFormatPBSnapshot$Loader.loadSnapshotDiffSection(FSImageFormatPBSnapshot.java:192)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:208)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:906)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:892)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:715)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:653)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:276)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:629)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:498)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:554)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to