[
https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284696#comment-14284696
]
Lei (Eddy) Xu commented on HDFS-6673:
-------------------------------------
[~wheat9] To provide more background, I described what I had tried here:
1. I had tried use {{directory ID || inode Id}} as key and {{INode}} protobuf
as value to store all INodes in LevelDB, the end-to-end time is about 40-50
minutes, while the time to dump INodes along is about 20-ish minutes, which is
already larger than the end-to-end time now (10 minutes). Moreover, when the
LevelDB become larger (about 1GB as I recalled), the write performance dropped
significantly. I suspected that it is because the
[write-amplification|https://github.com/facebook/rocksdb/wiki/RocksDB-Basics].
I have also tried to split one large LevelDB to multiple smaller ones, but it
does not worth the complexity. As a result, I dropped this approach and chose
to not re-order inodes.
2.
bq. This does not hold. FSImage stores the inodes with no order. See
{{FSImageFormatPBINode#serializeINodeSection.}}
Yes, you are right. But by checking {{INode#hashCode()}}, it seems that they
are not completely random when {{INode <= 2 ** 32}}. Despite of that, since
{{dirChildMap}} uses {{Long}} as keys and values. The size of {{dirChildMap}}
is 2 orders of magnitude smaller than the fsimage. So if the fsimage is
{{50GB}}, the leveldb is less than 1GB and can be reasonably well to fit into
OS cache on a laptop. Thus one seek per INode is not terribly bad maybe?
3. The {{DirPathCache}} caches the *full path* of the parent directory with 16K
entries. Suppose the average full path of a directory is about 128 bytes, it
uses only about ~1MB memory. I supposed that we can increase the capacity of
this LRUcache later when we actually measure the hit rates. I believe that this
LRUcache should work, given the fact that the measured performance of this
approach is faster.
4. Unlike in {{FileDistributionCalculator}}, we need the full path of an inode
when print it. Since directories and inodes are stored out of order in
fsimage, we need at least sorting directories or inodes to some extend. I chose
to sort directory, because
# The total # of directories is much smaller.
# The LRU cache is more (only) effective to directories.
Do these make sense to you, [~wheat9]. It would be great if I can get a +1 from
you.
Thanks!
> Add Delimited format supports for PB OIV tool
> ---------------------------------------------
>
> Key: HDFS-6673
> URL: https://issues.apache.org/jira/browse/HDFS-6673
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: 2.4.0
> Reporter: Lei (Eddy) Xu
> Assignee: Lei (Eddy) Xu
> Priority: Minor
> Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch,
> HDFS-6673.002.patch, HDFS-6673.003.patch, HDFS-6673.004.patch,
> HDFS-6673.005.patch
>
>
> The new oiv tool, which is designed for Protobuf fsimage, lacks a few
> features supported in the old {{oiv}} tool.
> This task adds supports of _Delimited_ processor to the oiv tool.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)