[jira] [Commented] (HDFS-4489) Use InodeID as as an identifier of a file in HDFS protocols and APIs

Suresh Srinivas (JIRA) Sat, 27 Apr 2013 10:44:16 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643745#comment-13643745
 ]


Suresh Srinivas commented on HDFS-4489:
---------------------------------------

I ran Slive tests. Even with very small size data written, I could not find 
perceptible difference between the test runs given any additional time in NN 
methods is dwarfed by the overall time of calling NN over RPC etc.

So I decided to run NNThroughputBenchmark. For folks new to it, it is a micro 
benchmark that does not use RPC and directly executes operations on the 
namenode class. Hence it gives comparisons sharply limited to NN method calls 
alone. I ran NNThroughputBenchmark command run to create 100K files using 100 
threads in each iteration, using the command below:
{noformat}
bin/hadoop jar share/hadoop/hdfs/hadoop-hdfs-2.0.5-SNAPSHOT-tests.jar 
org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -op create 
-threads 100 -files 100000 -filesPerDir 100 
{noformat}

*Without this patch:*
||Opertaions||Elapsed||OpsPerSec||AvgTime||
|100000| 20327| 4919.565110444237| 20|
|100000| 19199| 5208.604614823688| 19|
|100000| 19287| 5184.839529216571| 19|
|100000| 19128| 5227.9381012128815| 19|
|100000| 19082| 5240.540823813018| 19|
|100000| 18785| 5323.396326856535| 18|
|100000| 18947| 5277.880403230063| 18|
|100000| 18963| 5273.427200337499| 18|
|100000| 19206| 5206.706237634073| 19|
|100000| 19434| 5145.621076463929| 19|
|Average|19235.8|5200.851942|18.8|

*With this patch:*
||Opertaions||Elapsed||OpsPerSec||AvgTime||
|100000| 20104| 4974.134500596896| 19|
|100000| 19498| 5128.731151913017| 19|
|100000| 19449| 5141.652527122217| 19|
|100000| 19530| 5120.327700972863| 19|
|100000| 20067| 4983.305925150745| 19|
|100000| 19703| 5075.369233111709| 19|
|100000| 19595| 5103.342689461598| 19|
|100000| 19418| 5149.860953754249| 19|
|100000| 19932| 5017.057997190447| 19|
|100000| 20596| 4855.311711011847| 20|
|Average|19789.2|5054.909439|19.1|

*With this patch + an additional change to turn off INodeMap:*
||Opertaions||Elapsed||OpsPerSec||AvgTime||
|100000| 19615| 5098.139179199592| 19|
|100000| 19349| 5168.225748100677| 19|
|100000| 19136| 5225.752508361204| 19|
|100000| 19347| 5168.760014472528| 19|
|100000| 20096| 4976.114649681529| 19|
|100000| 19248| 5195.344970906068| 19|
|100000| 18916| 5286.529921759357| 18|
|100000| 19217| 5203.7258677212885| 19|
|100000| 20105| 4973.887092762994| 20|
|100000| 19882| 5029.675082989639| 19|
|Average|19491.1|5132.615504|19|

                
> Use InodeID as as an identifier of a file in HDFS protocols and APIs
> --------------------------------------------------------------------
>
>                 Key: HDFS-4489
>                 URL: https://issues.apache.org/jira/browse/HDFS-4489
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Brandon Li
>            Assignee: Brandon Li
>             Fix For: 2.0.5-beta
>
>
> The benefit of using InodeID to uniquely identify a file can be multiple 
> folds. Here are a few of them:
> 1. uniquely identify a file cross rename, related JIRAs include HDFS-4258, 
> HDFS-4437.
> 2. modification checks in tools like distcp. Since a file could have been 
> replaced or renamed to, the file name and size combination is no t reliable, 
> but the combination of file id and size is unique.
> 3. id based protocol support (e.g., NFS)
> 4. to make the pluggable block placement policy use fileid instead of 
> filename (HDFS-385).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4489) Use InodeID as as an identifier of a file in HDFS protocols and APIs

Reply via email to