Re: Max number of files in HDFS?

Sameer Paranjpye Tue, 28 Aug 2007 09:57:46 -0700


Taeho Kang wrote:

Hello Sameer. Thank you for your useful link. It's been very helpful!

By the way, our Hadoop cluster has a namenode with 4GBytes of RAM.
Based on the analysis found in the HADOOP-1687 (http://issues.apache.org/jira/browse/HADOOP-1687), we could probablystate that for every 1G of RAM gives the namenode a power to manage1,000,000 files, to be conservative (10,600,000 files / 9GBytes =1,177,777 files / 1GBytes)

That analysis is based on a Hadoop 0.13 deployment. Hadoop 0.14significantly improves on that by removing .crc files in the Namenode.If you have lots of small files in HDFS it will effectively cut memoryusage in half. What version of Hadoop are you using?

HADOOP-1687 also outlines an approach to further reduce memory usage inthe Namenode that could show a further 40% improvement. This ought to bedone in time for Hadoop 0.15. Beyond that the direction to take isunclear and there hasn't been a lot of discussion. As and when ithappens it'll show up on the bug list, so stay tuned.

If I were to apply this "rule" to my 4GB RAM namenode, it should have anability to manage 4,000,000 files.The number of files being stored onto our Hadoop DFS is 5000~6000 filesa day. That gives about 7~800 days, assuming the number of files storedeach day stay at the current level. Unfortunately, it has been steadilygoing up as more people in our company joined the fun of using theHadoop cluster.

Is there a plan to redesign the Namenode in a way it doesn't have thislimit? (e.g. use a DB for metadata management)


Thank you once again!

/Taeho

On 8/28/07, *Sameer Paranjpye* <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:


    How much memory does your Namenode machine have?

    You should look at the number of files, directories and blocks on your
    installation. All these numbers are available via NamenodeFsck.Result

    HADOOP-1687 ( http://issues.apache.org/jira/browse/HADOOP-1687) has a
    detailed discussion of the amount of memory used by Namenode data
    structures.

    Sameer

    Taeho Kang wrote:
     > Dear All,
     >
     > Hi, my name is Taeho and I am trying to figure out the maximum
    number of
     > files a namenode can hold.
     > The main reason for doing this is that I want to have some
    estimates on how
     > many files I can put into the HDFS without overflowing the Namenode
     > machine's memory.
     >
     > I know the number depends on the size of memory and how much is
    allocated
     > for the running JVM.
     > For the memory usage by the namenode, I can simply use Runtime
    object of
     > JDK.
     > For the total number of files residing in the DFS, I am thinking
    of using
     > getTotailfiles() funcion of NamenodeFsck class in
     > org.apache.hadoop.dfspacakge. Am I correct here in using
    NamenodeFsck?
     >
     > Or, has anybody done similar experiments?
     >
     > Any comments/suggestions will be appreciated.
     > Thanks in advance.
     > Best Regards,
     >




--
Taeho Kang [ tkang.blogspot.com <http://tkang.blogspot.com>]

Re: Max number of files in HDFS?

Reply via email to