[ 
https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284696#comment-14284696
 ] 

Lei (Eddy) Xu commented on HDFS-6673:
-------------------------------------

[~wheat9] To provide more background, I described what I had tried here:

1. I had tried use {{directory ID || inode Id}} as key and {{INode}} protobuf 
as value to store all INodes in LevelDB, the end-to-end time is about 40-50 
minutes, while the time to dump INodes along is about 20-ish minutes, which is 
already larger than the end-to-end time now (10 minutes). Moreover, when the 
LevelDB become larger (about 1GB as I recalled), the write performance dropped 
significantly. I suspected that it is because the 
[write-amplification|https://github.com/facebook/rocksdb/wiki/RocksDB-Basics]. 
I have also tried to split one large LevelDB to multiple smaller ones, but it 
does not worth the complexity. As a result, I dropped this approach and chose 
to not re-order inodes.

2. 
bq. This does not hold. FSImage stores the inodes with no order. See 
{{FSImageFormatPBINode#serializeINodeSection.}}

Yes, you are right.  But by checking {{INode#hashCode()}}, it seems that they 
are not completely random when {{INode <= 2 ** 32}}. Despite of that, since 
{{dirChildMap}} uses {{Long}} as keys and values. The size of {{dirChildMap}} 
is 2 orders of magnitude smaller than the fsimage.  So if the fsimage is 
{{50GB}}, the leveldb is less than 1GB and can be reasonably well to fit into 
OS cache on a laptop.  Thus one seek per INode is not terribly bad maybe?

3. The {{DirPathCache}} caches the *full path* of the parent directory with 16K 
entries. Suppose the average full path of a directory is about 128 bytes, it 
uses only about ~1MB memory. I supposed that we can increase the capacity of 
this LRUcache later when we actually measure the hit rates. I believe that this 
LRUcache should work, given the fact that the measured performance of this 
approach is faster.

4. Unlike in {{FileDistributionCalculator}}, we need the full path of an inode 
when print it.  Since directories and inodes are stored out of order in 
fsimage, we need at least sorting directories or inodes to some extend. I chose 
to sort directory, because 

# The total # of directories is much smaller.
# The LRU cache is more (only) effective to directories. 

Do these make sense to you, [~wheat9]. It would be great if I can get a +1 from 
you.

Thanks!

> Add Delimited format supports for PB OIV tool
> ---------------------------------------------
>
>                 Key: HDFS-6673
>                 URL: https://issues.apache.org/jira/browse/HDFS-6673
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 2.4.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>            Priority: Minor
>         Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, 
> HDFS-6673.002.patch, HDFS-6673.003.patch, HDFS-6673.004.patch, 
> HDFS-6673.005.patch
>
>
> The new oiv tool, which is designed for Protobuf fsimage, lacks a few 
> features supported in the old {{oiv}} tool. 
> This task adds supports of _Delimited_ processor to the oiv tool. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to