Gaojinchao, I'm not certain, but this looks a lot like some of the issues I've been dealing with lately (namely, non-Java-heap memory leakage).
First, -XX:MaxDirectMemorySize doesn't seem to be a solution. This flag is poorly documented, and moreover the problem appears to be related to releasing/reclaiming resources rather than over-allocating them. See http://bugs.sun.com/bugdatabase/view_bug.do;jsessionid=ae283c11508fb97ede5fe27a1554b?bug_id=4469299 Second, you may wish to experiment with "-XX:+UseParallelGC -XX:+UseParallelOldGC" rather than CMS GC. I have been trying this recently on some of my app servers and hadoop servers, and it certainly does fix the problem of non-Java heap growth. The concern with parallel GC is that full GCs (which are the solution to the non-heap memory problem, it would seem) take too long. Personally, I consider this reasoning fallacious, since full GC is bound to occur sooner or later, and when using the CMS GC with this bug in effect, they can be fatal (and even without this bug, CMS uses a single thread for a full GC AFAIK). The numbers for parallel GC on a 2G heap are not terrible, even without tuning, even with old processors (max pause 2.8 sec, avg pause 1 sec for a full GC, with minor collections outnumbering the major at least 3:1, total overhead 1.3%). If your application can tolerate a second or two of latency once in a while, you can switch to parallelOldGC and call it a day. The fact that some installations are trying to deal with ~24GB heaps sounds like a design issue to me; HBase and Hadoop are already designed to scale horizontally, and this emphasis on scaling vertically just because the hardware comes in a certain size sounds misguided. But not having that hardware, I might be missing something. Finally, you might look at changing the vm.swappiness parameter in the Linux kernel (I think it's in sysctl.conf). I have set swappiness to 0 for my servers, and I'm happy with it. I don't know the exact mechanism, but it certainly appears that there's a memory pressure feedback of some sort going on between the kernel and the JVM. Perhaps it has to do with the total commit charge appearing lower (just physical instead of physical + swap) when swappiness is low. I'd love to hear from someone with a deep understanding of OS memory allocation about this. Hope this helps, Sandy > -----Original Message----- > From: Gaojinchao [mailto:gaojinc...@huawei.com] > Sent: Saturday, December 03, 2011 19:58 > To: u...@hbase.apache.org; dev@hbase.apache.org > Cc: Chenjian; wenzaohua > Subject: FeedbackRe: Suspected memory leak > > Thank you for your help. > > This issue appears to be a configuration problem: > 1. HBase client uses NIO(socket) API that uses the direct memory. > 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there > doesn't have "full gc", all direct memory can't reclaim. Unfortunately, using > GC confiugre parameter of our client doesn't produce any "full gc". > > This is only a preliminary result, All tests is running, If have any further > results > , we will be fed back. > Finally , I will update our story to issue > https://issues.apache.org/jira/browse/HBASE-4633. > > If our digging is crrect, whether we should set a default value for the "- > XXMaxDirectMemorySize" to prevent this situation? > > > Thanks > > -----邮件原件----- > 发件人: bijieshan [mailto:bijies...@huawei.com] > 发送时间: 2011年12月2日 15:37 > 收件人: dev@hbase.apache.org; u...@hbase.apache.org > 抄送: Chenjian; wenzaohua > 主题: Re: Suspected memory leak > > Thank you all. > I think it's the same problem with the link provided by Stack. Because the > heap-size is stabilized, but the non-heap size keep growing. So I think not > the > problem of the CMS GC bug. > And we have known the content of the problem memory section, all the > records contains the info like below: > "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydi > ywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||" > "BBZHtable_UFDR_058,048342220093168-02570" > ........ > > Jieshan. > > -----邮件原件----- > 发件人: Kihwal Lee [mailto:kih...@yahoo-inc.com] > 发送时间: 2011年12月2日 4:20 > 收件人: dev@hbase.apache.org > 抄送: Ramakrishna s vasudevan; u...@hbase.apache.org > 主题: Re: Suspected memory leak > > Adding to the excellent write-up by Jonathan: > Since finalizer is involved, it takes two GC cycles to collect them. Due to a > bug/bugs in the CMS GC, collection may not happen and the heap can grow > really big. See > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for > details. > > Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket > related objects were being collected properly. This option forces the > concurrent marker to be one thread. This was for HDFS, but I think the same > applies here. > > Kihwal > > On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote: > > Make sure its not the issue that Jonathan Payne identifiied a while > back: > https://groups.google.com/group/asynchbase/browse_thread/thread/c45b > c7ba788b2357# > St.Ack