[
https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284942#comment-14284942
]
Haohui Mai commented on HDFS-6673:
----------------------------------
Just to recap. The current approaches are (please correct me if I'm wrong):
# Scan the {{INodeDirectorySection}} linearly and put a map {{childId ->
parentId}} into LevelDB.
# Scan the {{INodeSection}} and store a map {{id -> localName}} into LevelDB
for all directories.
# Scan the {{INodeSection}} and for each inode, to construct the full path by
looking up in the LevelDB.
The size of LevelDB is {{#inodes * sizeof(inodeid) * 2}} + {{local names for
all directories}} (as every inode has a parent). For a rough estimate, the size
of LevelDB is more than 8G for an image that contains 400M inodes. This is
large enough thus it may not fit in the working set.
In step (3) it requires several LevelDB look ups per inode. (I'm skeptical that
LRU actually works since there is really no locality here as mentioned
earlier). My concern is that once the LevelDB fails to fit in the working set,
the look up becomes at least one seek. Note that a typical HDD drive serves
around 100 IOP/S, thus for 400M inodes it takes 400M / 100 = 4M seconds ~ 1000
hours to complete.
My proposal are:
# Scan the {{INodeDirectorySection}} linearly and put a map {{childId ->
parentId}} in memory.
# Scan the {{INodeSection}} and for each inode, store a map {{parentid ||
localName -> info}} into LevelDB for all inodes.
# Scan the LevelDB using DFS and then output the result.
The differences are: (1) it has more writes as it stores all required
information into the LevelDB. (2) it requires a bigger working set at step (1).
(2) There is only one seek per directory instead of one seek per file in step
(3) which give a bound the total time when processing large fsimage.
More comments:
bq. the end-to-end time is about 40-50 minutes, while the time to dump INodes
along is about 20-ish minutes, which is already larger than the end-to-end time
now (10 minutes) ... I had tried use directory ID || inode Id as key and INode
protobuf as value to store all INodes in LevelDB
This is an apple-to-orange comparison. Protobuf has significant overheads due
to excessive object creation. I found it takes ~30% of the total processing
times when building the PB-based fsimage. I suggest dumping only the required
information in customized format for this patch.
> Add Delimited format supports for PB OIV tool
> ---------------------------------------------
>
> Key: HDFS-6673
> URL: https://issues.apache.org/jira/browse/HDFS-6673
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: 2.4.0
> Reporter: Lei (Eddy) Xu
> Assignee: Lei (Eddy) Xu
> Priority: Minor
> Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch,
> HDFS-6673.002.patch, HDFS-6673.003.patch, HDFS-6673.004.patch,
> HDFS-6673.005.patch
>
>
> The new oiv tool, which is designed for Protobuf fsimage, lacks a few
> features supported in the old {{oiv}} tool.
> This task adds supports of _Delimited_ processor to the oiv tool.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)