[ 
https://issues.apache.org/jira/browse/HDFS-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998734#comment-14998734
 ] 

Yongjun Zhang commented on HDFS-9406:
-------------------------------------

HI [[email protected]],

Thanks for reporting the issue. Do you have an older fsimage that can be loaded 
successfully and all the edit logs on top of that good fsimage that we can 
replay to reproduce the corrupted fsimage? It'd be great if you do.  Thanks.

.


> FSImage corruption after taking snapshot
> ----------------------------------------
>
>                 Key: HDFS-9406
>                 URL: https://issues.apache.org/jira/browse/HDFS-9406
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.0
>         Environment: CentOS 6 amd64, CDH 5.4.4-1
> 2xCPU: Intel(R) Xeon(R) CPU E5-2640 v3
> Memory: 32GB
> Namenode blocks: ~700_000 blocks, no HA setup
>            Reporter: Stanislav Antic
>
> FSImage corruption happened after HDFS snapshots were taken. Cluster was not 
> used
> at that time.
> When namenode restarts it reported NULL pointer exception:
> {code}
> 15/11/07 10:01:15 INFO namenode.FileJournalManager: Recovering unfinalized 
> segments in /tmp/fsimage_checker_5857/fsimage/current
> 15/11/07 10:01:15 INFO namenode.FSImage: No edit log streams selected.
> 15/11/07 10:01:18 INFO namenode.FSImageFormatPBINode: Loading 1370277 INodes.
> 15/11/07 10:01:27 ERROR namenode.NameNode: Failed to start namenode.
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.addChild(INodeDirectory.java:531)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:252)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:202)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:261)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:929)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:913)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:732)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:668)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1061)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:765)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:643)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:810)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:794)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1487)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1553)
> 15/11/07 10:01:27 INFO util.ExitUtil: Exiting with status 1
> {code}
> Corruption happened after "07.11.2015 00:15", and after that time blocks 
> ~9300 blocks were invalidated that shouldn't be.
> After recovering FSimage I discovered that around ~9300 blocks were missing.
> -I also attached log of namenode before and after corruption happened.-



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to