I thank you for all who have taken their time to answer my questions. We've been using version 0.13. We do not have much of a problem right now, but we will surely upgrade the system quite soon to 0.14 or 0.15 once it's released.
What are your opinions on implementing the namenode metadata management using a DB? (maybe as a subproject?) Do you think it will make the system more scalable or will the additional complexity of using the DB is not worthy of consideration? /Taeho On 8/29/07, Sameer Paranjpye <[EMAIL PROTECTED]> wrote: > > > > Taeho Kang wrote: > > Hello Sameer. Thank you for your useful link. It's been very helpful! > > > > By the way, our Hadoop cluster has a namenode with 4GBytes of RAM. > > > > Based on the analysis found in the HADOOP-1687 ( > > http://issues.apache.org/jira/browse/HADOOP-1687), we could probably > > state that for every 1G of RAM gives the namenode a power to manage > > 1,000,000 files, to be conservative (10,600,000 files / 9GBytes = > > 1,177,777 files / 1GBytes) > > That analysis is based on a Hadoop 0.13 deployment. Hadoop 0.14 > significantly improves on that by removing .crc files in the Namenode. > If you have lots of small files in HDFS it will effectively cut memory > usage in half. What version of Hadoop are you using? > > HADOOP-1687 also outlines an approach to further reduce memory usage in > the Namenode that could show a further 40% improvement. This ought to be > done in time for Hadoop 0.15. Beyond that the direction to take is > unclear and there hasn't been a lot of discussion. As and when it > happens it'll show up on the bug list, so stay tuned. > > > > > If I were to apply this "rule" to my 4GB RAM namenode, it should have an > > ability to manage 4,000,000 files. > > The number of files being stored onto our Hadoop DFS is 5000~6000 files > > a day. That gives about 7~800 days, assuming the number of files stored > > each day stay at the current level. Unfortunately, it has been steadily > > going up as more people in our company joined the fun of using the > > Hadoop cluster. > > > > Is there a plan to redesign the Namenode in a way it doesn't have this > > limit? (e.g. use a DB for metadata management) > > > > Thank you once again! > > > > /Taeho > > > > > > > > > > On 8/28/07, *Sameer Paranjpye* <[EMAIL PROTECTED] > > <mailto:[EMAIL PROTECTED]>> wrote: > > > > How much memory does your Namenode machine have? > > > > You should look at the number of files, directories and blocks on > your > > installation. All these numbers are available via > NamenodeFsck.Result > > > > HADOOP-1687 ( http://issues.apache.org/jira/browse/HADOOP-1687) has > a > > detailed discussion of the amount of memory used by Namenode data > > structures. > > > > Sameer > > > > Taeho Kang wrote: > > > Dear All, > > > > > > Hi, my name is Taeho and I am trying to figure out the maximum > > number of > > > files a namenode can hold. > > > The main reason for doing this is that I want to have some > > estimates on how > > > many files I can put into the HDFS without overflowing the > Namenode > > > machine's memory. > > > > > > I know the number depends on the size of memory and how much is > > allocated > > > for the running JVM. > > > For the memory usage by the namenode, I can simply use Runtime > > object of > > > JDK. > > > For the total number of files residing in the DFS, I am thinking > > of using > > > getTotailfiles() funcion of NamenodeFsck class in > > > org.apache.hadoop.dfspacakge. Am I correct here in using > > NamenodeFsck? > > > > > > Or, has anybody done similar experiments? > > > > > > Any comments/suggestions will be appreciated. > > > Thanks in advance. > > > Best Regards, > > > > > > > > > > > > > -- > > Taeho Kang [ tkang.blogspot.com <http://tkang.blogspot.com>] > -- Taeho Kang [tkang.blogspot.com] Software Engineer, NHN Corporation, Korea
