HDFS blocks are stored as files in the underlying filesystem of your
datanodes. Those files do not take a fixed amount of space, so if you
store 10 MB in a file and you have 128 MB blocks, you still only use
10 MB (times 3 with default replication).

However, the namenode does incur additional overhead by having to
track a larger number of small files. So, if you can merge files, it's
best practice to do so.

-Joey

On Tue, Sep 20, 2011 at 9:54 PM, hao.wang <[email protected]> wrote:
> Hi All:
>   I have lots of small files stored in HDFS. My HDFS block size is 128M. Each 
> file is significantly smaller than the HDFS block size.  Then, I want to know 
> whether the small file used 128M in HDFS?
>
> regards
> 2011-09-21
>
>
>
> hao.wang
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Reply via email to