Re: Max number of files in HDFS?

Taeho Kang Tue, 28 Aug 2007 02:30:45 -0700

Hello Sameer. Thank you for your useful link. It's been very helpful!

By the way, our Hadoop cluster has a namenode with 4GBytes of RAM.


Based on the analysis found in the HADOOP-1687 (
http://issues.apache.org/jira/browse/HADOOP-1687), we could probably state
that for every 1G of RAM gives the namenode a power to manage 1,000,000
files, to be conservative (10,600,000 files / 9GBytes = 1,177,777 files /
1GBytes)

If I were to apply this "rule" to my 4GB RAM namenode, it should have an
ability to manage 4,000,000 files.
The number of files being stored onto our Hadoop DFS is 5000~6000 files a
day. That gives about 7~800 days, assuming the number of files stored each
day stay at the current level. Unfortunately, it has been steadily going up
as more people in our company joined the fun of using the Hadoop cluster.

Is there a plan to redesign the Namenode in a way it doesn't have this
limit? (e.g. use a DB for metadata management)

Thank you once again!

/Taeho




On 8/28/07, Sameer Paranjpye <[EMAIL PROTECTED]> wrote:
>
> How much memory does your Namenode machine have?
>
> You should look at the number of files, directories and blocks on your
> installation. All these numbers are available via NamenodeFsck.Result
>
> HADOOP-1687 ( http://issues.apache.org/jira/browse/HADOOP-1687) has a
> detailed discussion of the amount of memory used by Namenode data
> structures.
>
> Sameer
>
> Taeho Kang wrote:
> > Dear All,
> >
> > Hi, my name is Taeho and I am trying to figure out the maximum number of
>
> > files a namenode can hold.
> > The main reason for doing this is that I want to have some estimates on
> how
> > many files I can put into the HDFS without overflowing the Namenode
> > machine's memory.
> >
> > I know the number depends on the size of memory and how much is
> allocated
> > for the running JVM.
> > For the memory usage by the namenode, I can simply use Runtime object of
> > JDK.
> > For the total number of files residing in the DFS, I am thinking of
> using
> > getTotailfiles() funcion of NamenodeFsck class in
> > org.apache.hadoop.dfspacakge. Am I correct here in using NamenodeFsck?
> >
> > Or, has anybody done similar experiments?
> >
> > Any comments/suggestions will be appreciated.
> > Thanks in advance.
> > Best Regards,
> >
>



-- 
Taeho Kang [tkang.blogspot.com]

Re: Max number of files in HDFS?

Reply via email to