[jira] [Commented] (HDFS-1447) Make getGenerationStampFromFile() more efficient, so it doesn't reprocess full directory listing for every block

Uma Maheswara Rao G (Commented) (JIRA) Fri, 14 Oct 2011 02:48:50 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127401#comment-13127401
 ]


Uma Maheswara Rao G commented on HDFS-1447:
-------------------------------------------

Updated patch for review!

Test results:
 I have created 10000 blocks and metafies in one directory.
 Verified the addToReplicasMap method with and without code changes.
 Found that , 27 times improvement.


Performance stats with and without code changes:


*With Fix:*
addToReplicasMapWithNewChange Test completed in 969
addToReplicasMapWithNewChange Test completed in 703
addToReplicasMapWithNewChange Test completed in 672
addToReplicasMapWithNewChange Test completed in 641
addToReplicasMapWithNewChange Test completed in 640
addToReplicasMapWithNewChange Test completed in 672
addToReplicasMapWithNewChange Test completed in 625
addToReplicasMapWithNewChange Test completed in 641
addToReplicasMapWithNewChange Test completed in 641
addToReplicasMapWithNewChange Test completed in 656
*Average Performance : 686*

*Without Fix:*
addToReplicasMapWithOldCode Test completed in 19516
addToReplicasMapWithOldCode Test completed in 19172
addToReplicasMapWithOldCode Test completed in 19126
addToReplicasMapWithOldCode Test completed in 19078
addToReplicasMapWithOldCode Test completed in 19079
addToReplicasMapWithOldCode Test completed in 19094
addToReplicasMapWithOldCode Test completed in 19188
addToReplicasMapWithOldCode Test completed in 19157
addToReplicasMapWithOldCode Test completed in 19125
addToReplicasMapWithOldCode Test completed in 19079
*Average Performance : 19161*

Seen ~27 times improvement for 10000 files in a directory. This improvement 
will be more when we take more number of files in directory.


Approach:
  Here there will be 2 iterations.
  For the first iteration we calculated the blockID, genStamp, BlockFile Length 
and populated into corresponding datastructures.
  For the second iteration (just half of the files (only for block files)), 
constructed the Replica and added to volume map.

Thanks
Uma
                
> Make getGenerationStampFromFile() more efficient, so it doesn't reprocess 
> full directory listing for every block
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1447
>                 URL: https://issues.apache.org/jira/browse/HDFS-1447
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: data-node
>    Affects Versions: 0.20.2
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>         Attachments: HDFS-1447.patch, Test_HDFS_1447_NotForCommitt.java.patch
>
>
> Make getGenerationStampFromFile() more efficient. Currently this routine is 
> called by addToReplicasMap() for every blockfile in the directory tree, and 
> it walks each file's containing directory on every call. There is a simple 
> refactoring that should make it more efficient.
> This work item is one of four sub-tasks for HDFS-1443, Improve Datanode 
> startup time.
> The fix will probably be folded into sibling task HDFS-1446, which is already 
> refactoring the method that calls getGenerationStampFromFile().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1447) Make getGenerationStampFromFile() more efficient, so it doesn't reprocess full directory listing for every block

Reply via email to