Hi Brendan, Every file, directory and block in HDFS is represented as an object in the namenode’s memory, each of which occupies 150 bytes.When we store many small files in the HDFS, these small files occupy a large portion of the namespace(large overhead on namenode). As a consequence, the disk space is underutilized because of the namespace limitation.If you want to handle "small files", you should go for "hadoop sequence file or HAR files" depending upon your use case..Hbase is also an option.But again it depends upon your use case.I would suggest you go through this blog - "http://www.cloudera.com/blog/2009/02/the-small-files-problem/". Must read for people managing large no of small files.
Regards, Mohammad Tariq On Tue, May 22, 2012 at 3:09 PM, Brendan cheng <ccp...@hotmail.com> wrote: > > Hi, > I read HDFS architecture doc and it said HDFS is tuned for at storing large > file, typically gigabyte to terabytes.What is the downsize of storing million > of small files like <10MB? or what setting of HDFS is suitable for storing > small files? > Actually, I plan to find a distribute filed system for storing mult million > of files. > Brendan