[ https://issues.apache.org/jira/browse/HDFS-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616634#comment-13616634 ]
Todd Lipcon commented on HDFS-4489: ----------------------------------- Here's the results from the latest patch: h2. Setup - Java 6u31 convigured with a 24gb heap (-Xms24g -Xmx24g) - fsimage is 4.1GB on disk, snapshot from a mid size production cluster which runs both hbase and some MR workloads. - 31249022 files and directories, 26525575 blocks = 57774597 total filesystem objects. In each test, I started the NameNode, waited until it had loaded the image and opened its IPC port, and then used "jmap -histo:live", which issues a full GC and reports heap usage statistics. h2. 2.0.3-beta release Total heap: 7069MB Top consumers {code} num #instances #bytes class name ---------------------------------------------- 1: 38421509 2049194112 [Ljava.lang.Object; 2: 26525179 1485410024 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo 3: 19134601 1071537656 org.apache.hadoop.hdfs.server.namenode.INodeFile 4: 16228949 753517120 [B 5: 12113580 581451840 org.apache.hadoop.hdfs.server.namenode.INodeDirectory 6: 19135442 484175352 [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo; 7: 1621399 403948560 [I 8: 11895039 285480936 java.util.ArrayList 9: 1 268435472 [Lorg.apache.hadoop.hdfs.util.LightWeightGSet$LinkedElement; {code} h2. Patched trunk with the map turned off Total heap: 7528MB (6.5% increase from 2.0.3) Top consumers {code} num #instances #bytes class name ---------------------------------------------- 1: 38421427 2049187584 [Ljava.lang.Object; 2: 26525179 1485410024 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo 3: 19134601 1377691272 org.apache.hadoop.hdfs.server.namenode.INodeFile 4: 12113580 775269120 org.apache.hadoop.hdfs.server.namenode.INodeDirectory 5: 16228690 753509864 [B 6: 19135442 484175352 [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo; 7: 1654298 384726200 [I 8: 11895040 285480960 java.util.ArrayList 9: 1 268435472 [Lorg.apache.hadoop.hdfs.util.LightWeightGSet$LinkedElement; {code} h2. Patched trunk with the map turned on Total heap: 7696MB (8.9% increase from 2.0) Top consumers {code} num #instances #bytes class name ---------------------------------------------- 1: 38421429 2049187632 [Ljava.lang.Object; 2: 26525179 1485410024 org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo 3: 19134601 1377691272 org.apache.hadoop.hdfs.server.namenode.INodeFile 4: 12113580 775269120 org.apache.hadoop.hdfs.server.namenode.INodeDirectory 5: 16228746 753515976 [B 6: 19135442 484175352 [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo; 7: 1499494 426158720 [I 8: 2 402653216 [Lorg.apache.hadoop.hdfs.util.LightWeightGSet$LinkedElement; 9: 11895040 285480960 java.util.ArrayList {code} I don't think this increased memory is necessarily unacceptable, I just wanted to see true measurement of the overhead instead of hypotheses. It looks like the increased memory cost is about twice what was estimated above. > Use InodeID as as an identifier of a file in HDFS protocols and APIs > -------------------------------------------------------------------- > > Key: HDFS-4489 > URL: https://issues.apache.org/jira/browse/HDFS-4489 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Reporter: Brandon Li > Assignee: Brandon Li > > The benefit of using InodeID to uniquely identify a file can be multiple > folds. Here are a few of them: > 1. uniquely identify a file cross rename, related JIRAs include HDFS-4258, > HDFS-4437. > 2. modification checks in tools like distcp. Since a file could have been > replaced or renamed to, the file name and size combination is no t reliable, > but the combination of file id and size is unique. > 3. id based protocol support (e.g., NFS) > 4. to make the pluggable block placement policy use fileid instead of > filename (HDFS-385). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira