[ 
https://issues.apache.org/jira/browse/HDFS-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616634#comment-13616634
 ] 

Todd Lipcon commented on HDFS-4489:
-----------------------------------

Here's the results from the latest patch:

h2. Setup
- Java 6u31 convigured with a 24gb heap (-Xms24g -Xmx24g)
- fsimage is 4.1GB on disk, snapshot from a mid size production cluster which 
runs both hbase and some MR workloads.
- 31249022 files and directories, 26525575 blocks = 57774597 total filesystem 
objects.

In each test, I started the NameNode, waited until it had loaded the image and 
opened its IPC port, and then used "jmap -histo:live", which issues a full GC 
and reports heap usage statistics.

h2. 2.0.3-beta release
Total heap: 7069MB

Top consumers
{code}
 num     #instances         #bytes  class name
----------------------------------------------
   1:      38421509     2049194112  [Ljava.lang.Object;
   2:      26525179     1485410024  
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo
   3:      19134601     1071537656  
org.apache.hadoop.hdfs.server.namenode.INodeFile
   4:      16228949      753517120  [B
   5:      12113580      581451840  
org.apache.hadoop.hdfs.server.namenode.INodeDirectory
   6:      19135442      484175352  
[Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo;
   7:       1621399      403948560  [I
   8:      11895039      285480936  java.util.ArrayList
   9:             1      268435472  
[Lorg.apache.hadoop.hdfs.util.LightWeightGSet$LinkedElement;
{code}

h2. Patched trunk with the map turned off
Total heap: 7528MB (6.5% increase from 2.0.3)

Top consumers
{code}
 num     #instances         #bytes  class name
----------------------------------------------
   1:      38421427     2049187584  [Ljava.lang.Object;
   2:      26525179     1485410024  
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo
   3:      19134601     1377691272  
org.apache.hadoop.hdfs.server.namenode.INodeFile
   4:      12113580      775269120  
org.apache.hadoop.hdfs.server.namenode.INodeDirectory
   5:      16228690      753509864  [B
   6:      19135442      484175352  
[Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo;
   7:       1654298      384726200  [I
   8:      11895040      285480960  java.util.ArrayList
   9:             1      268435472  
[Lorg.apache.hadoop.hdfs.util.LightWeightGSet$LinkedElement;
{code}

h2. Patched trunk with the map turned on
Total heap: 7696MB (8.9% increase from 2.0)

Top consumers
{code}
 num     #instances         #bytes  class name
----------------------------------------------
   1:      38421429     2049187632  [Ljava.lang.Object;
   2:      26525179     1485410024  
org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo
   3:      19134601     1377691272  
org.apache.hadoop.hdfs.server.namenode.INodeFile
   4:      12113580      775269120  
org.apache.hadoop.hdfs.server.namenode.INodeDirectory
   5:      16228746      753515976  [B
   6:      19135442      484175352  
[Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo;
   7:       1499494      426158720  [I
   8:             2      402653216  
[Lorg.apache.hadoop.hdfs.util.LightWeightGSet$LinkedElement;
   9:      11895040      285480960  java.util.ArrayList
{code}


I don't think this increased memory is necessarily unacceptable, I just wanted 
to see true measurement of the overhead instead of hypotheses. It looks like 
the increased memory cost is about twice what was estimated above.
                
> Use InodeID as as an identifier of a file in HDFS protocols and APIs
> --------------------------------------------------------------------
>
>                 Key: HDFS-4489
>                 URL: https://issues.apache.org/jira/browse/HDFS-4489
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Brandon Li
>            Assignee: Brandon Li
>
> The benefit of using InodeID to uniquely identify a file can be multiple 
> folds. Here are a few of them:
> 1. uniquely identify a file cross rename, related JIRAs include HDFS-4258, 
> HDFS-4437.
> 2. modification checks in tools like distcp. Since a file could have been 
> replaced or renamed to, the file name and size combination is no t reliable, 
> but the combination of file id and size is unique.
> 3. id based protocol support (e.g., NFS)
> 4. to make the pluggable block placement policy use fileid instead of 
> filename (HDFS-385).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to