Re: Max number of files in HDFS?

Enis Soztutar Tue, 28 Aug 2007 04:15:58 -0700


Taeho Kang wrote:

Hello Sameer. Thank you for your useful link. It's been very helpful!

By the way, our Hadoop cluster has a namenode with 4GBytes of RAM.

Based on the analysis found in the HADOOP-1687 (
http://issues.apache.org/jira/browse/HADOOP-1687), we could probably state
that for every 1G of RAM gives the namenode a power to manage 1,000,000
files, to be conservative (10,600,000 files / 9GBytes = 1,177,777 files /
1GBytes)

If I were to apply this "rule" to my 4GB RAM namenode, it should have an
ability to manage 4,000,000 files.
The number of files being stored onto our Hadoop DFS is 5000~6000 files a
day. That gives about 7~800 days, assuming the number of files stored each
day stay at the current level. Unfortunately, it has been steadily going up
as more people in our company joined the fun of using the Hadoop cluster.

Is there a plan to redesign the Namenode in a way it doesn't have this
limit? (e.g. use a DB for metadata management)

Actually there are several issues aiming to reduce namenode memoryusage, which will make into 0.15 release.

Thank you once again!

/Taeho




On 8/28/07, Sameer Paranjpye <[EMAIL PROTECTED]> wrote:

How much memory does your Namenode machine have?

You should look at the number of files, directories and blocks on your
installation. All these numbers are available via NamenodeFsck.Result

HADOOP-1687 ( http://issues.apache.org/jira/browse/HADOOP-1687) has a
detailed discussion of the amount of memory used by Namenode data
structures.

Sameer

Taeho Kang wrote:

Dear All,

Hi, my name is Taeho and I am trying to figure out the maximum number of

files a namenode can hold.

The main reason for doing this is that I want to have some estimates on

how

many files I can put into the HDFS without overflowing the Namenode
machine's memory.

I know the number depends on the size of memory and how much is

allocated

for the running JVM.
For the memory usage by the namenode, I can simply use Runtime object of
JDK.
For the total number of files residing in the DFS, I am thinking of

using

getTotailfiles() funcion of NamenodeFsck class in
org.apache.hadoop.dfspacakge. Am I correct here in using NamenodeFsck?

Or, has anybody done similar experiments?

Any comments/suggestions will be appreciated.
Thanks in advance.
Best Regards,

Re: Max number of files in HDFS?

Reply via email to