[
https://issues.apache.org/jira/browse/HDFS-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129829#comment-13129829
]
Todd Lipcon commented on HDFS-1447:
-----------------------------------
Also, what percent of startup time is devoted to CPU usage of this scan? In
HDFS-2384 I uploaded a C program which does the block scan as efficiently as
possible - but most of the gains there are from sorting by inum before statting.
> Make getGenerationStampFromFile() more efficient, so it doesn't reprocess
> full directory listing for every block
> ----------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-1447
> URL: https://issues.apache.org/jira/browse/HDFS-1447
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: data-node
> Affects Versions: 0.20.2
> Reporter: Matt Foley
> Assignee: Matt Foley
> Attachments: HDFS-1447.patch, Test_HDFS_1447_NotForCommitt.java.patch
>
>
> Make getGenerationStampFromFile() more efficient. Currently this routine is
> called by addToReplicasMap() for every blockfile in the directory tree, and
> it walks each file's containing directory on every call. There is a simple
> refactoring that should make it more efficient.
> This work item is one of four sub-tasks for HDFS-1443, Improve Datanode
> startup time.
> The fix will probably be folded into sibling task HDFS-1446, which is already
> refactoring the method that calls getGenerationStampFromFile().
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira