Re: Storing millions of small files

Mohammad Tariq Tue, 22 May 2012 03:04:15 -0700

Hi Brendan,
      Every file, directory and block in HDFS is represented as an
object in the namenode’s memory, each of which occupies 150 bytes.When
we store many small files in the HDFS, these small files occupy a
large portion of the namespace(large overhead on namenode). As a
consequence, the disk space is underutilized because of the namespace
limitation.If you want to handle "small files", you should go for
"hadoop sequence file or HAR files" depending upon your use
case..Hbase is also an option.But again it depends upon your use
case.I would suggest you go through this blog -
"http://www.cloudera.com/blog/2009/02/the-small-files-problem/";. Must
read for people managing large no of small files.


Regards,
    Mohammad Tariq


On Tue, May 22, 2012 at 3:09 PM, Brendan cheng <ccp...@hotmail.com> wrote:
>
> Hi,
> I read HDFS architecture doc and it said HDFS is tuned for at storing large 
> file, typically gigabyte to terabytes.What is the downsize of storing million 
> of small files like <10MB?  or what setting of HDFS is suitable for storing 
> small files?
> Actually, I plan to find a distribute filed system for storing mult million 
> of files.
> Brendan

Re: Storing millions of small files

Reply via email to