[ https://issues.apache.org/jira/browse/HDFS-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643745#comment-13643745 ]
Suresh Srinivas commented on HDFS-4489: --------------------------------------- I ran Slive tests. Even with very small size data written, I could not find perceptible difference between the test runs given any additional time in NN methods is dwarfed by the overall time of calling NN over RPC etc. So I decided to run NNThroughputBenchmark. For folks new to it, it is a micro benchmark that does not use RPC and directly executes operations on the namenode class. Hence it gives comparisons sharply limited to NN method calls alone. I ran NNThroughputBenchmark command run to create 100K files using 100 threads in each iteration, using the command below: {noformat} bin/hadoop jar share/hadoop/hdfs/hadoop-hdfs-2.0.5-SNAPSHOT-tests.jar org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -op create -threads 100 -files 100000 -filesPerDir 100 {noformat} *Without this patch:* ||Opertaions||Elapsed||OpsPerSec||AvgTime|| |100000| 20327| 4919.565110444237| 20| |100000| 19199| 5208.604614823688| 19| |100000| 19287| 5184.839529216571| 19| |100000| 19128| 5227.9381012128815| 19| |100000| 19082| 5240.540823813018| 19| |100000| 18785| 5323.396326856535| 18| |100000| 18947| 5277.880403230063| 18| |100000| 18963| 5273.427200337499| 18| |100000| 19206| 5206.706237634073| 19| |100000| 19434| 5145.621076463929| 19| |Average|19235.8|5200.851942|18.8| *With this patch:* ||Opertaions||Elapsed||OpsPerSec||AvgTime|| |100000| 20104| 4974.134500596896| 19| |100000| 19498| 5128.731151913017| 19| |100000| 19449| 5141.652527122217| 19| |100000| 19530| 5120.327700972863| 19| |100000| 20067| 4983.305925150745| 19| |100000| 19703| 5075.369233111709| 19| |100000| 19595| 5103.342689461598| 19| |100000| 19418| 5149.860953754249| 19| |100000| 19932| 5017.057997190447| 19| |100000| 20596| 4855.311711011847| 20| |Average|19789.2|5054.909439|19.1| *With this patch + an additional change to turn off INodeMap:* ||Opertaions||Elapsed||OpsPerSec||AvgTime|| |100000| 19615| 5098.139179199592| 19| |100000| 19349| 5168.225748100677| 19| |100000| 19136| 5225.752508361204| 19| |100000| 19347| 5168.760014472528| 19| |100000| 20096| 4976.114649681529| 19| |100000| 19248| 5195.344970906068| 19| |100000| 18916| 5286.529921759357| 18| |100000| 19217| 5203.7258677212885| 19| |100000| 20105| 4973.887092762994| 20| |100000| 19882| 5029.675082989639| 19| |Average|19491.1|5132.615504|19| > Use InodeID as as an identifier of a file in HDFS protocols and APIs > -------------------------------------------------------------------- > > Key: HDFS-4489 > URL: https://issues.apache.org/jira/browse/HDFS-4489 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Reporter: Brandon Li > Assignee: Brandon Li > Fix For: 2.0.5-beta > > > The benefit of using InodeID to uniquely identify a file can be multiple > folds. Here are a few of them: > 1. uniquely identify a file cross rename, related JIRAs include HDFS-4258, > HDFS-4437. > 2. modification checks in tools like distcp. Since a file could have been > replaced or renamed to, the file name and size combination is no t reliable, > but the combination of file id and size is unique. > 3. id based protocol support (e.g., NFS) > 4. to make the pluggable block placement policy use fileid instead of > filename (HDFS-385). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira