Hi On, The namenode stores the full filesystem image in memory. Looking at your stats, you have ~30 million files/directories and ~47 million blocks. That means that on average, each of your files is only ~1.4 blocks in size. One way to lower the pressure on the namenode would be to store fewer, larger files. If you're able to concatenate files and still parse them, great. Otherwise, Hadoop provides a couple of container file formats that might help.
SequenceFiles are Hadoop specific binary files that store key/value pairs. If your data fits that model, you can convert the data into SequenceFiles when you write it to HDFS, including data from multiple input files in a single SequenceFile. Here is a simple example of using the SequenceFile API: http://programmer-land.blogspot.com/2009/04/hadoop-sequence-files.html Another options are Hadoop Archive files (HARs). A HAR file lets you combine multiple smaller files into a virtual filesystem. Here are some links with details on HARs: http://developer.yahoo.com/blogs/hadoop/posts/2010/07/hadoop_archive_file_compaction/ http://hadoop.apache.org/mapreduce/docs/current/hadoop_archives.html If you're able to use any of these techniques to grow your average file size, then you can also save memory by increasing the block size. The default block size is 64MB, most clusters I've been exposed to run at 128MB. -Joey On Fri, Jun 10, 2011 at 7:45 AM, si...@ugcv.com <si...@ugcv.com> wrote: > Dear all, > > I'm looking for ways to improve the namenode heap size usage of a 800-node > 10PB testing Hadoop cluster that stores around 30 million files. > > Here's some info: > > 1 x namenode: 32GB RAM, 24GB heap size > 800 x datanode: 8GB RAM, 13TB hdd > > 33050825 files and directories, 47708724 blocks = 80759549 total. Heap Size > is 22.93 GB / 22.93 GB (100%) > > From the cluster summary report, it seems the heap size usage is always full > but couldn't drop, do you guys know of any ways to reduce it ? So far I > don't see any namenode OOM errors so it looks memory assigned for the > namenode process is (just) enough. But i'm curious which factors would > account for the full use of heap size ? > > Regards, > On > > > > -- Joseph Echeverria Cloudera, Inc. 443.305.9434