The short version is that the in-memory structures used by the NameNode are
"heavy" on a per-file basis, and light on a per-block basis. So petabytes of
files that are only a few hundred KB will require the NameNode to have a
huge amount of memory to hold the filesystem data structures. More than you
want to pay to fit in a single server.

There's more details about this issue in an older post on our blog:
http://www.cloudera.com/blog/2009/02/02/the-small-files-problem/

- Aaron


On Mon, Mar 30, 2009 at 12:46 AM, deepya <[email protected]> wrote:

>
> Hi,
>  Thanks.
>
>   Can you please specify in detail what kind of problems I will face if I
> use Hadoop for this project.
>
> SreeDeepya
>
>
> TimRobertson100 wrote:
> >
> > I believe Hadoop is not best suited to many small files like yours but
> > is really geared to handling very large files that get split into many
> > smaller files (like 128M chunks) and HDFS is designed with this in
> > mind.  Therefore I could *imagine* that there are other distributed
> > file systems that would far outperform HDFS if they were designed to
> > replicate and track small files without any *split* and *merging*
> > which Hadoop provides.
> >
> > Having not used MogileFS I cant really advise well but a quick read
> > through does look like it might be a candidate for you to consider -
> > it looks like it distributes across machines and tracks replicas like
> > HDFS without the splitting, and offers access through http to the
> > individual files which I could imagine would be ideal for pulling back
> > small images.
> >
> > Please don't just follow my advise though - I am still a relative
> > newbie to DFS's in general.
> >
> > Cheers
> >
> > Tim
> >
> >
> >
> > On Sun, Mar 29, 2009 at 12:51 PM, deepya <[email protected]> wrote:
> >>
> >> Hi,
> >>  I am doing a project scalable storage server to store images.Can Hadoop
> >> efficiently support this purpose???Our image size will be around 250 to
> >> 300
> >> KB each.But we have many such images.Like the total storage may run upto
> >> petabytes( in future) .At present it is in gigabytes.
> >>   We want to access these images via apache server.I mean,is there any
> >> mechanism that we can directly talk to hdfs via apache server???
> >>
> >> I went through one of the posts here and got to know that rather than
> >> using
> >> FUSE it is better to use HDFS API.That is fine.But they also mentioned
> >> that
> >> mozilefs will be more appropriate.
> >>
> >> Can some one please clarify why mozilefs is more appropriate.Cant hadoop
> >> be
> >> used???How is mozile more advantageous.Can you suggest which filesystem
> >> would be more appropriate for the project I am doing at present.
> >>
> >> Thanks in advance
> >>
> >>
> >> SreeDeepya
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/a-doubt-regarding-an-appropriate-file-system-tp22766331p22766331.html
> >> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/a-doubt-regarding-an-appropriate-file-system-tp22766331p22777879.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Reply via email to