[ 
https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284942#comment-14284942
 ] 

Haohui Mai commented on HDFS-6673:
----------------------------------

Just to recap. The current approaches are (please correct me if I'm wrong):

  # Scan the {{INodeDirectorySection}} linearly and put a map {{childId -> 
parentId}} into LevelDB. 
  # Scan the {{INodeSection}} and store a map {{id -> localName}} into LevelDB 
for all directories.
  # Scan the {{INodeSection}} and for each inode, to construct the full path by 
looking up in the LevelDB.

The size of LevelDB is {{#inodes * sizeof(inodeid) * 2}} + {{local names for 
all directories}} (as every inode has a parent). For a rough estimate, the size 
of LevelDB is more than 8G for an image that contains 400M inodes. This is 
large enough thus it may not fit in the working set.

In step (3) it requires several LevelDB look ups per inode. (I'm skeptical that 
LRU actually works since there is really no locality here as mentioned 
earlier). My concern is that once the LevelDB fails to fit in the working set, 
the look up becomes at least one seek. Note that a typical HDD drive serves 
around 100 IOP/S, thus for 400M inodes it takes 400M / 100 = 4M seconds ~ 1000 
hours to complete.

My proposal are:

  # Scan the {{INodeDirectorySection}} linearly and put a map {{childId -> 
parentId}} in memory.
  # Scan the {{INodeSection}} and for each inode, store a map {{parentid || 
localName -> info}} into LevelDB for all inodes.
  # Scan the LevelDB using DFS and then output the result.

The differences are: (1) it has more writes as it stores all required 
information into the LevelDB. (2) it requires a bigger working set at step (1). 
(2) There is only one seek per directory instead of one seek per file in step 
(3) which give a bound the total time when processing large fsimage.

More comments:

bq. the end-to-end time is about 40-50 minutes, while the time to dump INodes 
along is about 20-ish minutes, which is already larger than the end-to-end time 
now (10 minutes) ... I had tried use directory ID || inode Id as key and INode 
protobuf as value to store all INodes in LevelDB

This is an apple-to-orange comparison. Protobuf has significant overheads due 
to excessive object creation. I found it takes ~30% of the total processing 
times when building the PB-based fsimage. I suggest dumping only the required 
information in customized format for this patch.

> Add Delimited format supports for PB OIV tool
> ---------------------------------------------
>
>                 Key: HDFS-6673
>                 URL: https://issues.apache.org/jira/browse/HDFS-6673
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 2.4.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>            Priority: Minor
>         Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, 
> HDFS-6673.002.patch, HDFS-6673.003.patch, HDFS-6673.004.patch, 
> HDFS-6673.005.patch
>
>
> The new oiv tool, which is designed for Protobuf fsimage, lacks a few 
> features supported in the old {{oiv}} tool. 
> This task adds supports of _Delimited_ processor to the oiv tool. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to