I thank you for all who have taken their time to answer my questions.

We've been using version 0.13. We do not have much of a problem right now,
but we will surely upgrade the system quite soon to 0.14 or 0.15 once it's
released.

What are your opinions on implementing the namenode metadata management
using a DB? (maybe as a subproject?)
Do you think it will make the system more scalable or will the additional
complexity of using the DB is not worthy of consideration?

/Taeho

On 8/29/07, Sameer Paranjpye <[EMAIL PROTECTED]> wrote:
>
>
>
> Taeho Kang wrote:
> > Hello Sameer. Thank you for your useful link. It's been very helpful!
> >
> > By the way, our Hadoop cluster has a namenode with 4GBytes of RAM.
> >
> > Based on the analysis found in the HADOOP-1687 (
> > http://issues.apache.org/jira/browse/HADOOP-1687), we could probably
> > state that for every 1G of RAM gives the namenode a power to manage
> > 1,000,000 files, to be conservative (10,600,000 files / 9GBytes =
> > 1,177,777 files / 1GBytes)
>
> That analysis is based on a Hadoop 0.13 deployment. Hadoop 0.14
> significantly improves on that by removing .crc files in the Namenode.
> If you have lots of small files in HDFS it will effectively cut memory
> usage in half. What version of Hadoop are you using?
>
> HADOOP-1687 also outlines an approach to further reduce memory usage in
> the Namenode that could show a further 40% improvement. This ought to be
> done in time for Hadoop 0.15. Beyond that the direction to take is
> unclear and there hasn't been a lot of discussion. As and when it
> happens it'll show up on the bug list, so stay tuned.
>
> >
> > If I were to apply this "rule" to my 4GB RAM namenode, it should have an
> > ability to manage 4,000,000 files.
> > The number of files being stored onto our Hadoop DFS is 5000~6000 files
> > a day. That gives about 7~800 days, assuming the number of files stored
> > each day stay at the current level. Unfortunately, it has been steadily
> > going up as more people in our company joined the fun of using the
> > Hadoop cluster.
> >
> > Is there a plan to redesign the Namenode in a way it doesn't have this
> > limit? (e.g. use a DB for metadata management)
> >
> > Thank you once again!
> >
> > /Taeho
> >
> >
> >
> >
> > On 8/28/07, *Sameer Paranjpye* <[EMAIL PROTECTED]
> > <mailto:[EMAIL PROTECTED]>> wrote:
> >
> >     How much memory does your Namenode machine have?
> >
> >     You should look at the number of files, directories and blocks on
> your
> >     installation. All these numbers are available via
> NamenodeFsck.Result
> >
> >     HADOOP-1687 ( http://issues.apache.org/jira/browse/HADOOP-1687) has
> a
> >     detailed discussion of the amount of memory used by Namenode data
> >     structures.
> >
> >     Sameer
> >
> >     Taeho Kang wrote:
> >      > Dear All,
> >      >
> >      > Hi, my name is Taeho and I am trying to figure out the maximum
> >     number of
> >      > files a namenode can hold.
> >      > The main reason for doing this is that I want to have some
> >     estimates on how
> >      > many files I can put into the HDFS without overflowing the
> Namenode
> >      > machine's memory.
> >      >
> >      > I know the number depends on the size of memory and how much is
> >     allocated
> >      > for the running JVM.
> >      > For the memory usage by the namenode, I can simply use Runtime
> >     object of
> >      > JDK.
> >      > For the total number of files residing in the DFS, I am thinking
> >     of using
> >      > getTotailfiles() funcion of NamenodeFsck class in
> >      > org.apache.hadoop.dfspacakge. Am I correct here in using
> >     NamenodeFsck?
> >      >
> >      > Or, has anybody done similar experiments?
> >      >
> >      > Any comments/suggestions will be appreciated.
> >      > Thanks in advance.
> >      > Best Regards,
> >      >
> >
> >
> >
> >
> > --
> > Taeho Kang [ tkang.blogspot.com <http://tkang.blogspot.com>]
>



-- 
Taeho Kang [tkang.blogspot.com]
Software Engineer, NHN Corporation, Korea

Reply via email to