[
https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292617#comment-14292617
]
Lei (Eddy) Xu commented on HDFS-6673:
-------------------------------------
Thank you so much for the continuous inputs on this issue, [~andrew.wang],
[~wheat9] and [~cmccabe]!
Just want to add a little bit more information regarding our design
considerations.
bq. Convert the fsimage into LevelDB before running the oiv.
We do agree that an ordered fsimage in LevelDB can be scanned much faster than
the approach we used in the path. However, the main concern about this approach
and the reason that we gave up on this are that: we discovered that writing a
large fsimage (> 1GB) into LevelDB along is several times slower (4-5x) than
the end-to-end time used in the latest patch. We believed that the bottleneck
is write amplification on LevelDB, but not in-memory computation (e.g.,
serialization), since we had observed that the throughput of writing inodes to
LevelDB continuously drops _significantly_ after the db size becomes larger
than 1GB. That's the reason that we expected it would be much worse for even
larger fsimage.
Adding another data point, currently for the 3.3GB (33M inodes) fsimage we
test, we have less than 300MB metadata in LevelDB. If we could assume that the
file distributions are similar amount fsimages, we will have {{2-3GB}} leveldb
DB for {{20-GB}} fsimage ({{6-8GB}} leveldb for {{40GB}} (400M inodes)). The
working set here is {{6-8GB}} leveldb, which is still arguably reasonable for
today's laptop memory. Moreover, today's laptops have quite fast SSD for decent
random IO :)
I would be very interested to see the performance results on such {{400M}}
inodes fsimage if possible, which will definitely help me to optimize this
patch.
bq. Tweak saver of the pb-based fsimage so that it stores the inodes using with
the order of the full path. It can be done without changing the format of the
current fsimage.
That would be much appreciated if this can be done.
> Add Delimited format supports for PB OIV tool
> ---------------------------------------------
>
> Key: HDFS-6673
> URL: https://issues.apache.org/jira/browse/HDFS-6673
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: 2.4.0
> Reporter: Lei (Eddy) Xu
> Assignee: Lei (Eddy) Xu
> Priority: Minor
> Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch,
> HDFS-6673.002.patch, HDFS-6673.003.patch, HDFS-6673.004.patch,
> HDFS-6673.005.patch, HDFS-6673.006.patch
>
>
> The new oiv tool, which is designed for Protobuf fsimage, lacks a few
> features supported in the old {{oiv}} tool.
> This task adds supports of _Delimited_ processor to the oiv tool.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)