[
https://issues.apache.org/jira/browse/HDFS-13693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890285#comment-16890285
]
Stephen O'Donnell commented on HDFS-13693:
------------------------------------------
I think this performance improvement is a great discovery, but the change does
carry some future risk, in that if something changes in how the image is loaded
it would be easy to miss this optimization. However, most changes involve some
risk and this does give a decent speed improvement so its probably worth it.
I tried this change in my testing around loading the fsimage in parallel in
HDFS-14617. I found that in the single threaded case, the load time was
improved by about 35 seconds (326 to 291 seconds for just the directory section
load time), but when I moved to parallel loading (4 threads), this change had
negligible impact. Probably because the work was spread out over more threads
and there are other points of serialization that slow things down.
I am happy for this to go in but thought it was worth highlighting the above.
> Remove unnecessary search in INodeDirectory.addChild during image loading
> -------------------------------------------------------------------------
>
> Key: HDFS-13693
> URL: https://issues.apache.org/jira/browse/HDFS-13693
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Reporter: zhouyingchao
> Assignee: Lisheng Sun
> Priority: Major
> Attachments: HDFS-13693-001.patch, HDFS-13693-002.patch,
> HDFS-13693-003.patch, HDFS-13693-004.patch, HDFS-13693-005.patch
>
>
> In FSImageFormatPBINode.loadINodeDirectorySection, all child INodes are added
> to their parent INode's map one by one. The adding procedure will search a
> position in the parent's map and then insert the child to the position.
> However, during image loading, the search is unnecessary since the insert
> position should always be at the end of the map given the sequence they are
> serialized on disk.
> Test this patch against a fsimage of a 70PB cluster (200million files and
> 300million blocks), the image loading time be reduced from 1210 seconds to
> 1138 seconds.So it can reduce up to about 10% of time.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]