[jira] [Commented] (HDFS-8998) Small files storage supported inside HDFS

Lei (Eddy) Xu (JIRA) Mon, 14 Sep 2015 13:28:58 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-8998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14744196#comment-14744196
 ]


Lei (Eddy) Xu commented on HDFS-8998:
-------------------------------------

Hi, [~zhangyongxyz] Thanks a lot for working on this.

I have a few more questions regarding your design.

* In the doc, you mentioned that the current HDFS small file design 
(SequentialFile or HAR) has the following problems: ??bad opening performance, 
file deletions and access control??. Could you give a more explicitly 
explanation about which problem(s) you are solving in this design?

If I understand correctly, this design is offload small file metadata from 
"index file" in SequentialFile/HAR to inodes in NN, so that it can keep files 
in blocks. Is it the case? Could you elaborate more about the potential 
performance benefits, and the suitable workloads for it?

* You also mentioned that SequentialFile/HAR are for read-only purpose. 

Is this design optimized for write? 

* bq. Small file INodes in small file zone has no structure changed.

Would INodes need to use an {{offset}} to track the start position in a block? 
The design doc suggests that all metadata are not stored on DN? 

One more question, should we keep track of deleted space in a block? Which 
server makes the decision of block re-written (compaction?). Would be nice to 
see some analysis of the time / space complexity of the design.

* bq. After background restructure or merge, small file will not support 
**truncate**

These background processes are transparent to the end users. Users might get 
confused because these files can be truncated at some time but not at the other 
times. If truncate is not a common operation, can we move the truncated file to 
a new file? Speaking of this, it'd be nice to see what is the typical size of 
small files in your cases.

Appreciate if you can also address these questions in the updated design. 



> Small files storage supported inside HDFS
> -----------------------------------------
>
>                 Key: HDFS-8998
>                 URL: https://issues.apache.org/jira/browse/HDFS-8998
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Yong Zhang
>            Assignee: Yong Zhang
>         Attachments: HDFS-8998.design.001.pdf
>
>
> HDFS has problems on store small files, just like this blog said 
> (http://blog.cloudera.com/blog/2009/02/the-small-files-problem).
> This blog also tell us some way how to store small file in HDFS, but they are 
> not good way, seems HAR files and Sequence Files are better for read-only 
> files.
> Current each HDFS block is only for one HDFS file, if too many small file 
> there, many small blocks will be in DataNode, which will make DataNode heavy 
> loading.
> This jira will show how to online merge small blocks to big one, and how to 
> delete small file, and so on.
> Cerrentlly we have many open jira for improving HDFS scalability on NameNode, 
> such as HDFS-7836, HDFS-8286 and so on. 
> So small file meta (INode and BlocksMap) will also be in NameNode.
> Design document will be uploaded soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8998) Small files storage supported inside HDFS

Reply via email to