[jira] [Commented] (HDFS-9406) FSImage corruption after taking snapshot

Jing Zhao (JIRA) Fri, 29 Jan 2016 11:02:49 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123987#comment-15123987
 ]


Jing Zhao commented on HDFS-9406:
---------------------------------

Thanks for the patch, Yongjun! The patch looks good to me. But looks like we 
need to fix {{TestINodeFile#testClearBlocks}} because of the new 
{{clearBlocks}} logic.

In the meanwhile, can we also add a test for the case you mentioned in this 
jira? Although this one may not cause fsimage corruption, but we can check if 
the file has finally been deleted from the inodeMap.
{code}
  @Test
  public void testRenameAndDelete() throws IOException {
    final Path foo = new Path("/foo");
    final Path x = new Path(foo, "x");
    final Path y = new Path(foo, "y");
    final Path trash = new Path("/trash");
    fs.mkdirs(x);
    fs.mkdirs(y);
    fs.mkdirs(trash);
    fs.allowSnapshot(foo);
    // 1. create snapshot s0
    fs.createSnapshot(foo, "s0");
    // 2. create file /foo/x/bar
    final Path file = new Path(x, "bar");
    DFSTestUtil.createFile(fs, file, BLOCKSIZE, (short) 1, 0L);
    final long fileId = fsdir.getINode4Write(file.toString()).getId();
    // 3. move file into /foo/y
    final Path newFile = new Path(y, "bar");
    fs.rename(file, newFile);
    // 4. create snapshot s1
    fs.createSnapshot(foo, "s1");
    // 5. move /foo/y to /trash
    final Path deletedY = new Path(trash, "y");
    fs.rename(y, deletedY);
    // 6. create snapshot s2
    fs.createSnapshot(foo, "s2");
    // 7. delete /trash/y
    fs.delete(deletedY, true);
    // 8. delete snapshot s1
    fs.deleteSnapshot(foo, "s1");
    // make sure bar has been cleaned
    Assert.assertNull(fsdir.getInode(fileId));
  }
{code}

> FSImage corruption after taking snapshot
> ----------------------------------------
>
>                 Key: HDFS-9406
>                 URL: https://issues.apache.org/jira/browse/HDFS-9406
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.0
>         Environment: CentOS 6 amd64, CDH 5.4.4-1
> 2xCPU: Intel(R) Xeon(R) CPU E5-2640 v3
> Memory: 32GB
> Namenode blocks: ~700_000 blocks, no HA setup
>            Reporter: Stanislav Antic
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-9406.001.patch, HDFS-9406.002.patch
>
>
> FSImage corruption happened after HDFS snapshots were taken. Cluster was not 
> used
> at that time.
> When namenode restarts it reported NULL pointer exception:
> {code}
> 15/11/07 10:01:15 INFO namenode.FileJournalManager: Recovering unfinalized 
> segments in /tmp/fsimage_checker_5857/fsimage/current
> 15/11/07 10:01:15 INFO namenode.FSImage: No edit log streams selected.
> 15/11/07 10:01:18 INFO namenode.FSImageFormatPBINode: Loading 1370277 INodes.
> 15/11/07 10:01:27 ERROR namenode.NameNode: Failed to start namenode.
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.addChild(INodeDirectory.java:531)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:252)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:202)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:261)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:929)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:913)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:732)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:668)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1061)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:765)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:643)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:810)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:794)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1487)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1553)
> 15/11/07 10:01:27 INFO util.ExitUtil: Exiting with status 1
> {code}
> Corruption happened after "07.11.2015 00:15", and after that time blocks 
> ~9300 blocks were invalidated that shouldn't be.
> After recovering FSimage I discovered that around ~9300 blocks were missing.
> -I also attached log of namenode before and after corruption happened.-



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9406) FSImage corruption after taking snapshot

Reply via email to