[jira] [Commented] (HDFS-6673) Add Delimited format supports for PB OIV tool

Lei (Eddy) Xu (JIRA) Mon, 26 Jan 2015 15:27:06 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292617#comment-14292617
 ]


Lei (Eddy) Xu commented on HDFS-6673:
-------------------------------------

Thank you so much for the continuous inputs on this issue, [~andrew.wang], 
[~wheat9] and [~cmccabe]!

Just want to add a little bit more information regarding our design 
considerations.

bq. Convert the fsimage into LevelDB before running the oiv.

We do agree that an ordered fsimage in LevelDB can be scanned much faster than 
the approach we used in the path. However, the main concern about this approach 
and the reason that we gave up on this are that:  we discovered that writing a 
large fsimage (> 1GB) into LevelDB along is several times slower (4-5x) than 
the end-to-end time used in the latest patch. We believed that the bottleneck 
is write amplification on LevelDB, but not in-memory computation (e.g., 
serialization), since we had observed that the throughput of writing inodes to 
LevelDB continuously drops _significantly_ after the db size becomes larger 
than 1GB. That's the reason that we expected it would be much worse for even 
larger fsimage. 

Adding another data point, currently for the 3.3GB (33M inodes) fsimage we 
test, we have less than 300MB metadata in LevelDB. If we could assume that the 
file distributions are similar amount fsimages, we will have {{2-3GB}} leveldb 
DB for {{20-GB}} fsimage ({{6-8GB}} leveldb for {{40GB}} (400M inodes)). The 
working set here is {{6-8GB}} leveldb, which is still arguably reasonable for 
today's laptop memory. Moreover, today's laptops have quite fast SSD for decent 
random IO :)

I would be very interested to see the performance results on such {{400M}} 
inodes fsimage if possible, which will definitely help me to optimize this 
patch. 

bq. Tweak saver of the pb-based fsimage so that it stores the inodes using with 
the order of the full path. It can be done without changing the format of the 
current fsimage.

That would be much appreciated if this can be done. 

> Add Delimited format supports for PB OIV tool
> ---------------------------------------------
>
>                 Key: HDFS-6673
>                 URL: https://issues.apache.org/jira/browse/HDFS-6673
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 2.4.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>            Priority: Minor
>         Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, 
> HDFS-6673.002.patch, HDFS-6673.003.patch, HDFS-6673.004.patch, 
> HDFS-6673.005.patch, HDFS-6673.006.patch
>
>
> The new oiv tool, which is designed for Protobuf fsimage, lacks a few 
> features supported in the old {{oiv}} tool. 
> This task adds supports of _Delimited_ processor to the oiv tool. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6673) Add Delimited format supports for PB OIV tool

Reply via email to