Jinchao: Since we found the workaround, can you summarize the following statistics on HBASE-4633 ?
Thanks 2011/12/4 Gaojinchao <gaojinc...@huawei.com> > Yes, I have tested, System is fine. > Nearly one hours , trigger a full GC. > 10022.210: [Full GC (System) 10022.210: [Tenured: > 577566K->257349K(1048576K), 1.7515610 secs] 9651924K->257349K(14260672K), > [Perm : 19161K->19161K(65536K)], 1.7518350 secs] [Times: user=1.75 > sys=0.00, real=1.75 secs] > ......... > > ......... > 13532.930: [GC 13532.931: [ParNew: 12801558K->981626K(13212096K), > 0.1414370 secs] 13111752K->1291828K(14260672K), 0.1416880 secs] [Times: > user=1.90 sys=0.01, real=0.14 secs] > 13624.630: [Full GC (System) 13624.630: [Tenured: > 310202K->175378K(1048576K), 1.9529280 secs] 11581276K->175378K(14260672K), > [Perm : 19225K->19225K(65536K)], 1.9531660 secs] > [Times: user=1.94 sys=0.00, real=1.96 secs] > > 7543 root 20 0 17.0g 15g 9892 S 0 32.9 1184:34 java > 7543 root 20 0 17.0g 15g 9892 S 1 32.9 1184:34 java > > -----邮件原件----- > 发件人: Ted Yu [mailto:yuzhih...@gmail.com] > 发送时间: 2011年12月5日 9:06 > 收件人: dev@hbase.apache.org > 主题: Re: FeedbackRe: Suspected memory leak > > Can you try specifying XX:MaxDirectMemorySize with moderate value and see > if the leak gets under control ? > > Thanks > > 2011/12/4 Gaojinchao <gaojinc...@huawei.com> > > > I have attached the stack in > > https://issues.apache.org/jira/browse/HBASE-4633. > > I will update our story. > > > > > > -----邮件原件----- > > 发件人: Ted Yu [mailto:yuzhih...@gmail.com] > > 发送时间: 2011年12月5日 7:37 > > 收件人: dev@hbase.apache.org; lars hofhansl > > 主题: Re: FeedbackRe: Suspected memory leak > > > > I looked through TRUNK and 0.90 code but didn't find > > HBaseClient.Connection.setParam(). > > The method should be sendParam(). > > > > When I was in China I tried to access Jonathan's post but wasn't able to. > > > > If Jinchao's stack trace resonates with the one Jonathan posted, we > should > > consider using netty for HBaseClient. > > > > Cheers > > > > On Sun, Dec 4, 2011 at 1:12 PM, lars hofhansl <lhofha...@yahoo.com> > wrote: > > > > > I think HBASE-4508 is unrelated. > > > The "connections" I referring to are HBaseClient.Connection objects > (not > > > HConnections). > > > It turns out that HBaseClient.Connection.setParam is actually called > > > directly by the client threads, which means we can get > > > an unlimited amount of DirectByteBuffers (until we get a full GC). > > > > > > The JDK will cache 3 per thread with a size necessary to serve the IO. > So > > > sending some large requests from many thread > > > will lead to OOM. > > > > > > I think that was a related thread that Stack forwarded a while back > from > > > the asynchbase mailing lists. > > > > > > Jinchao, could you add a text version (not a png image, please :-) ) of > > > this to the jira? > > > > > > > > > -- Lars > > > > > > > > > > > > ----- Original Message ----- > > > From: Ted Yu <yuzhih...@gmail.com> > > > To: dev@hbase.apache.org; lars hofhansl <lhofha...@yahoo.com> > > > Cc: Gaojinchao <gaojinc...@huawei.com>; Chenjian < > > jean.chenj...@huawei.com>; > > > wenzaohua <wenzao...@huawei.com> > > > Sent: Sunday, December 4, 2011 12:43 PM > > > Subject: Re: FeedbackRe: Suspected memory leak > > > > > > I think Jinchao wasn't using HBASE-4508 in his 0.90 distribution > because > > > 0.90.5 hasn't been released. > > > Assuming the NIO consumption is related to the number of connections > from > > > client side, it would help to perform benchmarking on 0.90.5 > > > > > > Jinchao: > > > Please attach stack trace to HBASE-4633 so that we can verify our > > > assumptions. > > > > > > Thanks > > > > > > On Sun, Dec 4, 2011 at 12:24 PM, lars hofhansl <lhofha...@yahoo.com> > > > wrote: > > > > > > > Thanks. Now the question is: How many connection threads do we have? > > > > > > > > I think there is one per regionserver, which would indeed be a > problem. > > > > Need to look at the code again (I'm only partially familiar with the > > > > client code). > > > > > > > > Either the client should chunk (like the server does), or there > should > > be > > > > a limited number of thread that > > > > perform IO on behalf of the client (or both). > > > > > > > > -- Lars > > > > > > > > > > > > ----- Original Message ----- > > > > From: Gaojinchao <gaojinc...@huawei.com> > > > > To: "dev@hbase.apache.org" <dev@hbase.apache.org>; lars hofhansl < > > > > lhofha...@yahoo.com> > > > > Cc: Chenjian <jean.chenj...@huawei.com>; wenzaohua < > > wenzao...@huawei.com > > > > > > > > Sent: Saturday, December 3, 2011 11:22 PM > > > > Subject: Re: FeedbackRe: Suspected memory leak > > > > > > > > This is dump stack. > > > > > > > > > > > > -----邮件原件----- > > > > 发件人: lars hofhansl [mailto:lhofha...@yahoo.com] > > > > 发送时间: 2011年12月4日 14:15 > > > > 收件人: dev@hbase.apache.org > > > > 抄送: Chenjian; wenzaohua > > > > 主题: Re: FeedbackRe: Suspected memory leak > > > > > > > > Dropping user list. > > > > > > > > Could you (or somebody) point me to where the client is using NIO? > > > > I'm looking at HBaseClient and I do not see references to NIO, also > it > > > > seems that all work is handed off to > > > > separate threads: HBaseClient.Connection, and the JDK will not cache > > more > > > > than 3 direct buffers per thread. > > > > > > > > It's possible (likely?) that I missed something in the code. > > > > > > > > Thanks. > > > > > > > > -- Lars > > > > > > > > ________________________________ > > > > From: Gaojinchao <gaojinc...@huawei.com> > > > > To: "u...@hbase.apache.org" <u...@hbase.apache.org>; " > > > dev@hbase.apache.org" > > > > <dev@hbase.apache.org> > > > > Cc: Chenjian <jean.chenj...@huawei.com>; wenzaohua < > > wenzao...@huawei.com > > > > > > > > Sent: Saturday, December 3, 2011 7:57 PM > > > > Subject: FeedbackRe: Suspected memory leak > > > > > > > > Thank you for your help. > > > > > > > > This issue appears to be a configuration problem: > > > > 1. HBase client uses NIO(socket) API that uses the direct memory. > > > > 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if > > > > there doesn't have "full gc", all direct memory can't reclaim. > > > > Unfortunately, using GC confiugre parameter of our client doesn't > > produce > > > > any "full gc". > > > > > > > > This is only a preliminary result, All tests is running, If have any > > > > further results , we will be fed back. > > > > Finally , I will update our story to issue > > > > https://issues.apache.org/jira/browse/HBASE-4633. > > > > > > > > If our digging is crrect, whether we should set a default value for > the > > > > "-XXMaxDirectMemorySize" to prevent this situation? > > > > > > > > > > > > Thanks > > > > > > > > -----邮件原件----- > > > > 发件人: bijieshan [mailto:bijies...@huawei.com] > > > > 发送时间: 2011年12月2日 15:37 > > > > 收件人: dev@hbase.apache.org; u...@hbase.apache.org > > > > 抄送: Chenjian; wenzaohua > > > > 主题: Re: Suspected memory leak > > > > > > > > Thank you all. > > > > I think it's the same problem with the link provided by Stack. > Because > > > the > > > > heap-size is stabilized, but the non-heap size keep growing. So I > think > > > not > > > > the problem of the CMS GC bug. > > > > And we have known the content of the problem memory section, all the > > > > records contains the info like below: > > > > > > > > > > > > > > "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||"; > > > > "BBZHtable_UFDR_058,048342220093168-02570" > > > > ........ > > > > > > > > Jieshan. > > > > > > > > -----邮件原件----- > > > > 发件人: Kihwal Lee [mailto:kih...@yahoo-inc.com] > > > > 发送时间: 2011年12月2日 4:20 > > > > 收件人: dev@hbase.apache.org > > > > 抄送: Ramakrishna s vasudevan; u...@hbase.apache.org > > > > 主题: Re: Suspected memory leak > > > > > > > > Adding to the excellent write-up by Jonathan: > > > > Since finalizer is involved, it takes two GC cycles to collect them. > > Due > > > > to a bug/bugs in the CMS GC, collection may not happen and the heap > can > > > > grow really big. See > > > > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for > > details. > > > > > > > > Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the > > > socket > > > > related objects were being collected properly. This option forces the > > > > concurrent marker to be one thread. This was for HDFS, but I think > the > > > same > > > > applies here. > > > > > > > > Kihwal > > > > > > > > On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote: > > > > > > > > Make sure its not the issue that Jonathan Payne identifiied a while > > > > back: > > > > > > > > > > https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357# > > > > St.Ack > > > > > > > > > > > > > > > > >