[
https://issues.apache.org/jira/browse/HDFS-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xiaoqiao He resolved HDFS-15987.
--------------------------------
Fix Version/s: 3.4.0
Hadoop Flags: Reviewed
Resolution: Fixed
Committed to trunk. Thanks [~wanghongbing] for your contributions!
> Improve oiv tool to parse fsimage file in parallel with delimited format
> ------------------------------------------------------------------------
>
> Key: HDFS-15987
> URL: https://issues.apache.org/jira/browse/HDFS-15987
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Hongbing Wang
> Assignee: Hongbing Wang
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: Improve_oiv_tool_001.pdf
>
> Time Spent: 6.5h
> Remaining Estimate: 0h
>
> The purpose of this Jira is to improve oiv tool to parse fsimage file with
> sub-sections (see -HDFS-14617-) in parallel with delmited format.
> 1.Serial parsing is time-consuming
> The time to serially parse a large fsimage with delimited format (e.g. `hdfs
> oiv -p Delimited -t <tmp> ...`) is as follows:
> {code:java}
> 1) Loading string table: -> Not time consuming.
> 2) Loading inode references: -> Not time consuming
> 3) Loading directories in INode section: -> Slightly time consuming (3%)
> 4) Loading INode directory section: -> A bit time consuming (11%)
> 5) Output: -> Very time consuming (86%){code}
> Therefore, output is the most parallelized stage.
> 2.How to output in parallel
> The sub-sections are grouped in order, and each thread processes a group and
> outputs it to the file corresponding to each thread, and finally merges the
> output files.
> 3. The result of a test
> {code:java}
> input fsimage file info:
> 3.4G, 12 sub-sections, 55976500 INodes
> -----------------------------------------
> Threads TotalTime OutputTime MergeTime
> 1 18m37s 16m18s –
> 4 8m7s 4m49s 41s{code}
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]