Running fsck and invoking getTotalFiles() seems to the right way to figure out the total number of files in the dfs.
Thanks, dhruba -----Original Message----- From: Taeho Kang [mailto:[EMAIL PROTECTED] Sent: Monday, August 27, 2007 11:59 PM To: [email protected] Cc: [EMAIL PROTECTED] Subject: Max number of files in HDFS? Dear All, Hi, my name is Taeho and I am trying to figure out the maximum number of files a namenode can hold. The main reason for doing this is that I want to have some estimates on how many files I can put into the HDFS without overflowing the Namenode machine's memory. I know the number depends on the size of memory and how much is allocated for the running JVM. For the memory usage by the namenode, I can simply use Runtime object of JDK. For the total number of files residing in the DFS, I am thinking of using getTotailfiles() funcion of NamenodeFsck class in org.apache.hadoop.dfspacakge. Am I correct here in using NamenodeFsck? Or, has anybody done similar experiments? Any comments/suggestions will be appreciated. Thanks in advance. Best Regards, -- Taeho Kang Software Engineer, NHN Corporation, Seoul, South Korea Homepage : tkang.blogspot.com
