zhaoyunjiong created HDFS-7470: ---------------------------------- Summary: SecondaryNameNode need twice memory when calling reloadFromImageFile Key: HDFS-7470 URL: https://issues.apache.org/jira/browse/HDFS-7470 Project: Hadoop HDFS Issue Type: Bug Reporter: zhaoyunjiong Assignee: zhaoyunjiong
histo information at 2014-12-02 01:19 {quote} num #instances #bytes class name ---------------------------------------------- 1: 186449630 19326123016 [Ljava.lang.Object; 2: 157366649 15107198304 org.apache.hadoop.hdfs.server.namenode.INodeFile 3: 183409030 11738177920 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo 4: 157358401 5244264024 [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo; 5: 3 3489661000 [Lorg.apache.hadoop.util.LightWeightGSet$LinkedElement; 6: 29253275 1872719664 [B 7: 3230821 284312248 org.apache.hadoop.hdfs.server.namenode.INodeDirectory 8: 2756284 110251360 java.util.ArrayList 9: 469158 22519584 org.apache.hadoop.fs.permission.AclEntry 10: 847 17133032 [Ljava.util.HashMap$Entry; 11: 188471 17059632 [C 12: 314614 10067656 [Lorg.apache.hadoop.hdfs.server.namenode.INode$Feature; 13: 234579 9383160 com.google.common.collect.RegularImmutableList 14: 49584 6850280 <constMethodKlass> 15: 49584 6356704 <methodKlass> 16: 187270 5992640 java.lang.String 17: 234579 5629896 org.apache.hadoop.hdfs.server.namenode.AclFeature {quote} histo information at 2014-12-02 01:32 {quote} num #instances #bytes class name ---------------------------------------------- 1: 355838051 35566651032 [Ljava.lang.Object; 2: 302272758 29018184768 org.apache.hadoop.hdfs.server.namenode.INodeFile 3: 352500723 22560046272 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo 4: 302264510 10075087952 [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo; 5: 177120233 9374983920 [B 6: 3 3489661000 [Lorg.apache.hadoop.util.LightWeightGSet$LinkedElement; 7: 6191688 544868544 org.apache.hadoop.hdfs.server.namenode.INodeDirectory 8: 2799256 111970240 java.util.ArrayList 9: 890728 42754944 org.apache.hadoop.fs.permission.AclEntry 10: 330986 29974408 [C 11: 596871 19099880 [Lorg.apache.hadoop.hdfs.server.namenode.INode$Feature; 12: 445364 17814560 com.google.common.collect.RegularImmutableList 13: 844 17132816 [Ljava.util.HashMap$Entry; 14: 445364 10688736 org.apache.hadoop.hdfs.server.namenode.AclFeature 15: 329789 10553248 java.lang.String 16: 91741 8807136 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction 17: 49584 6850280 <constMethodKlass> {quote} And the stack trace shows it was doing reloadFromImageFile: {quote} at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getInode(FSDirectory.java:2426) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:160) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243) at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:121) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:902) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:888) at org.apache.hadoop.hdfs.server.namenode.FSImage.reloadFromImageFile(FSImage.java:562) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:1048) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:536) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:388) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:354) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1630) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:413) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:350) at java.lang.Thread.run(Thread.java:745) {quote} So before doing reloadFromImageFile, I think we need release old namesystem to prevent SecondaryNameNode OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)