On Jun 10, 2011, at 6:32 AM, [email protected] wrote: > Dear all, > > I'm looking for ways to improve the namenode heap size usage of a 800-node > 10PB testing Hadoop cluster that stores > around 30 million files. > > Here's some info: > > 1 x namenode: 32GB RAM, 24GB heap size > 800 x datanode: 8GB RAM, 13TB hdd > > *33050825 files and directories, 47708724 blocks = 80759549 total. Heap Size > is 22.93 GB / 22.93 GB (100%) * > > From the cluster summary report, it seems the heap size usage is always full > but couldn't drop, do you guys know of any ways > to reduce it ? So far I don't see any namenode OOM errors so it looks memory > assigned for the namenode process is (just) > enough. But i'm curious which factors would account for the full use of heap > size ? >
The advice I give to folks is to plan on 1GB heap for every million objects. It's an over-estimate, but I prefer to be on the safe side. Why not increase the heap-size to 28GB? Should buy you some time. You can turn on compressed pointers, but your best bet is really going to be spending some more money on RAM. Brian
