[jira] [Commented] (HDFS-15987) Improve oiv tool to parse fsimage file in parallel with delimited format

Xiaoqiao He (Jira) Sat, 17 Apr 2021 03:10:28 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324222#comment-17324222
 ]


Xiaoqiao He commented on HDFS-15987:
------------------------------------

move this JIRA to sub-task of HDFS-14617.

> Improve oiv tool to parse fsimage file in parallel with delimited format
> ------------------------------------------------------------------------
>
>                 Key: HDFS-15987
>                 URL: https://issues.apache.org/jira/browse/HDFS-15987
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Hongbing Wang
>            Assignee: Hongbing Wang
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The purpose of this Jira is to improve oiv tool to parse fsimage file with 
> sub-sections (see -HDFS-14617-) in parallel with delmited format. 
> 1.Serial parsing is time-consuming
> The time to serially parse a large fsimage with delimited format (e.g. `hdfs 
> oiv -p Delimited -t <tmp> ...`) is as follows: 
> {code:java}
> 1) Loading string table:                 -> Not time consuming.
> 2) Loading inode references:             -> Not time consuming
> 3) Loading directories in INode section: -> Slightly time consuming (3%)
> 4) Loading INode directory section:      -> A bit time consuming (11%)
> 5) Output:                               -> Very time consuming (86%){code}
> Therefore, output is the most parallelized stage.
> 2.How to output in parallel
> The sub-sections are grouped in order, and each thread processes a group and 
> outputs it to the file corresponding to each thread, and finally merges the 
> output files.
> 3. The result of a test
> {code:java}
>  input fsimage file info:
>  3.4G, 12 sub-sections, 55976500 INodes
>  -----------------------------------------
>  Threads TotalTime OutputTime MergeTime
>  1       18m37s     16m18s      –
>  4        8m7s      4m49s       41s{code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-15987) Improve oiv tool to parse fsimage file in parallel with delimited format

Reply via email to