[
https://issues.apache.org/jira/browse/HDFS-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324222#comment-17324222
]
Xiaoqiao He commented on HDFS-15987:
------------------------------------
move this JIRA to sub-task of HDFS-14617.
> Improve oiv tool to parse fsimage file in parallel with delimited format
> ------------------------------------------------------------------------
>
> Key: HDFS-15987
> URL: https://issues.apache.org/jira/browse/HDFS-15987
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Hongbing Wang
> Assignee: Hongbing Wang
> Priority: Major
> Labels: pull-request-available
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> The purpose of this Jira is to improve oiv tool to parse fsimage file with
> sub-sections (see -HDFS-14617-) in parallel with delmited format.
> 1.Serial parsing is time-consuming
> The time to serially parse a large fsimage with delimited format (e.g. `hdfs
> oiv -p Delimited -t <tmp> ...`) is as follows:
> {code:java}
> 1) Loading string table: -> Not time consuming.
> 2) Loading inode references: -> Not time consuming
> 3) Loading directories in INode section: -> Slightly time consuming (3%)
> 4) Loading INode directory section: -> A bit time consuming (11%)
> 5) Output: -> Very time consuming (86%){code}
> Therefore, output is the most parallelized stage.
> 2.How to output in parallel
> The sub-sections are grouped in order, and each thread processes a group and
> outputs it to the file corresponding to each thread, and finally merges the
> output files.
> 3. The result of a test
> {code:java}
> input fsimage file info:
> 3.4G, 12 sub-sections, 55976500 INodes
> -----------------------------------------
> Threads TotalTime OutputTime MergeTime
> 1 18m37s 16m18s –
> 4 8m7s 4m49s 41s{code}
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]