Re: FeedbackRe: Suspected memory leak

Lars Sun, 04 Dec 2011 22:09:37 -0800

To Ted... yes sorry sendParam.

Any better solution involves changing the code.


I could envision a form of active object where all NIO is handled by a small 
pool of threads and/or doing chunking into (say) 8k chunks on the client. Or 
both.

In both cases there would less direct buffer garbage produced by the client.

Why is sendParam called directly by the client (app) threads? Is it to enforce 
ordering? 

Lastly, XX:MaxDirectMemorySize should definitely be documented.

-- Lars

Gaojinchao <[email protected]> schrieb:

>Ok. Anyone has better solution?. Do we need to introduce in book?
>
>
>-----邮件原件-----
>发件人: Ted Yu [mailto:[email protected]] 
>发送时间: 2011年12月5日 11:39
>收件人: [email protected]
>主题: Re: FeedbackRe: Suspected memory leak
>
>Jinchao:
>Since we found the workaround, can you summarize the following statistics
>on HBASE-4633 ?
>
>Thanks
>
>2011/12/4 Gaojinchao <[email protected]>
>
>> Yes, I have tested, System is fine.
>> Nearly one hours , trigger a full GC.
>> 10022.210: [Full GC (System) 10022.210: [Tenured:
>> 577566K->257349K(1048576K), 1.7515610 secs] 9651924K->257349K(14260672K),
>> [Perm : 19161K->19161K(65536K)], 1.7518350 secs] [Times: user=1.75
>> sys=0.00, real=1.75 secs]
>> .........
>>
>> .........
>> 13532.930: [GC 13532.931: [ParNew: 12801558K->981626K(13212096K),
>> 0.1414370 secs] 13111752K->1291828K(14260672K), 0.1416880 secs] [Times:
>> user=1.90 sys=0.01, real=0.14 secs]
>> 13624.630: [Full GC (System) 13624.630: [Tenured:
>> 310202K->175378K(1048576K), 1.9529280 secs] 11581276K->175378K(14260672K),
>> [Perm : 19225K->19225K(65536K)], 1.9531660 secs]
>>           [Times: user=1.94 sys=0.00, real=1.96 secs]
>>
>> 7543 root      20   0 17.0g  15g 9892 S    0 32.9   1184:34 java
>> 7543 root      20   0 17.0g  15g 9892 S    1 32.9   1184:34 java
>>
>> -----邮件原件-----
>> 发件人: Ted Yu [mailto:[email protected]]
>> 发送时间: 2011年12月5日 9:06
>> 收件人: [email protected]
>> 主题: Re: FeedbackRe: Suspected memory leak
>>
>> Can you try specifying XX:MaxDirectMemorySize with moderate value and see
>> if the leak gets under control ?
>>
>> Thanks
>>
>> 2011/12/4 Gaojinchao <[email protected]>
>>
>> > I have attached the stack in
>> > https://issues.apache.org/jira/browse/HBASE-4633.
>> > I will update our story.
>> >
>> >
>> > -----邮件原件-----
>> > 发件人: Ted Yu [mailto:[email protected]]
>> > 发送时间: 2011年12月5日 7:37
>> > 收件人: [email protected]; lars hofhansl
>> > 主题: Re: FeedbackRe: Suspected memory leak
>> >
>> > I looked through TRUNK and 0.90 code but didn't find
>> > HBaseClient.Connection.setParam().
>> > The method should be sendParam().
>> >
>> > When I was in China I tried to access Jonathan's post but wasn't able to.
>> >
>> > If Jinchao's stack trace resonates with the one Jonathan posted, we
>> should
>> > consider using netty for HBaseClient.
>> >
>> > Cheers
>> >
>> > On Sun, Dec 4, 2011 at 1:12 PM, lars hofhansl <[email protected]>
>> wrote:
>> >
>> > > I think HBASE-4508 is unrelated.
>> > > The "connections" I referring to are HBaseClient.Connection objects
>> (not
>> > > HConnections).
>> > > It turns out that HBaseClient.Connection.setParam is actually called
>> > > directly by the client threads, which means we can get
>> > > an unlimited amount of DirectByteBuffers (until we get a full GC).
>> > >
>> > > The JDK will cache 3 per thread with a size necessary to serve the IO.
>> So
>> > > sending some large requests from many thread
>> > > will lead to OOM.
>> > >
>> > > I think that was a related thread that Stack forwarded a while back
>> from
>> > > the asynchbase mailing lists.
>> > >
>> > > Jinchao, could you add a text version (not a png image, please :-) ) of
>> > > this to the jira?
>> > >
>> > >
>> > > -- Lars
>> > >
>> > >
>> > >
>> > > ----- Original Message -----
>> > > From: Ted Yu <[email protected]>
>> > > To: [email protected]; lars hofhansl <[email protected]>
>> > > Cc: Gaojinchao <[email protected]>; Chenjian <
>> > [email protected]>;
>> > > wenzaohua <[email protected]>
>> > > Sent: Sunday, December 4, 2011 12:43 PM
>> > > Subject: Re: FeedbackRe: Suspected memory leak
>> > >
>> > > I think Jinchao wasn't using HBASE-4508 in his 0.90 distribution
>> because
>> > > 0.90.5 hasn't been released.
>> > > Assuming the NIO consumption is related to the number of connections
>> from
>> > > client side, it would help to perform benchmarking on 0.90.5
>> > >
>> > > Jinchao:
>> > > Please attach stack trace to HBASE-4633 so that we can verify our
>> > > assumptions.
>> > >
>> > > Thanks
>> > >
>> > > On Sun, Dec 4, 2011 at 12:24 PM, lars hofhansl <[email protected]>
>> > > wrote:
>> > >
>> > > > Thanks. Now the question is: How many connection threads do we have?
>> > > >
>> > > > I think there is one per regionserver, which would indeed be a
>> problem.
>> > > > Need to look at the code again (I'm only partially familiar with the
>> > > > client code).
>> > > >
>> > > > Either the client should chunk (like the server does), or there
>> should
>> > be
>> > > > a limited number of thread that
>> > > > perform IO on behalf of the client (or both).
>> > > >
>> > > > -- Lars
>> > > >
>> > > >
>> > > > ----- Original Message -----
>> > > > From: Gaojinchao <[email protected]>
>> > > > To: "[email protected]" <[email protected]>; lars hofhansl <
>> > > > [email protected]>
>> > > > Cc: Chenjian <[email protected]>; wenzaohua <
>> > [email protected]
>> > > >
>> > > > Sent: Saturday, December 3, 2011 11:22 PM
>> > > > Subject: Re: FeedbackRe: Suspected memory leak
>> > > >
>> > > > This is dump stack.
>> > > >
>> > > >
>> > > > -----邮件原件-----
>> > > > 发件人: lars hofhansl [mailto:[email protected]]
>> > > > 发送时间: 2011年12月4日 14:15
>> > > > 收件人: [email protected]
>> > > > 抄送: Chenjian; wenzaohua
>> > > > 主题: Re: FeedbackRe: Suspected memory leak
>> > > >
>> > > > Dropping user list.
>> > > >
>> > > > Could you (or somebody) point me to where the client is using NIO?
>> > > > I'm looking at HBaseClient and I do not see references to NIO, also
>> it
>> > > > seems that all work is handed off to
>> > > > separate threads: HBaseClient.Connection, and the JDK will not cache
>> > more
>> > > > than 3 direct buffers per thread.
>> > > >
>> > > > It's possible (likely?) that I missed something in the code.
>> > > >
>> > > > Thanks.
>> > > >
>> > > > -- Lars
>> > > >
>> > > > ________________________________
>> > > > From: Gaojinchao <[email protected]>
>> > > > To: "[email protected]" <[email protected]>; "
>> > > [email protected]"
>> > > > <[email protected]>
>> > > > Cc: Chenjian <[email protected]>; wenzaohua <
>> > [email protected]
>> > > >
>> > > > Sent: Saturday, December 3, 2011 7:57 PM
>> > > > Subject: FeedbackRe: Suspected memory leak
>> > > >
>> > > > Thank you for your help.
>> > > >
>> > > > This issue appears to be a configuration problem:
>> > > > 1. HBase client uses NIO(socket) API that uses the direct memory.
>> > > > 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if
>> > > > there doesn't have "full gc", all direct memory can't reclaim.
>> > > > Unfortunately, using GC confiugre parameter of our client doesn't
>> > produce
>> > > > any "full gc".
>> > > >
>> > > > This is only a preliminary result,  All tests is running, If have any
>> > > > further results , we will be fed back.
>> > > > Finally , I will update our story to issue
>> > > > https://issues.apache.org/jira/browse/HBASE-4633.
>> > > >
>> > > > If our digging is crrect, whether we should set a default value for
>> the
>> > > > "-XXMaxDirectMemorySize" to prevent this situation?
>> > > >
>> > > >
>> > > > Thanks
>> > > >
>> > > > -----邮件原件-----
>> > > > 发件人: bijieshan [mailto:[email protected]]
>> > > > 发送时间: 2011年12月2日 15:37
>> > > > 收件人: [email protected]; [email protected]
>> > > > 抄送: Chenjian; wenzaohua
>> > > > 主题: Re: Suspected memory leak
>> > > >
>> > > > Thank you all.
>> > > > I think it's the same problem with the link provided by Stack.
>> Because
>> > > the
>> > > > heap-size is stabilized, but the non-heap size keep growing. So I
>> think
>> > > not
>> > > > the problem of the CMS GC bug.
>> > > > And we have known the content of the problem memory section, all the
>> > > > records contains the info like below:
>> > > >
>> > > >
>> > >
>> >
>> "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||";
>> > > > "BBZHtable_UFDR_058,048342220093168-02570"
>> > > > ........
>> > > >
>> > > > Jieshan.
>> > > >
>> > > > -----邮件原件-----
>> > > > 发件人: Kihwal Lee [mailto:[email protected]]
>> > > > 发送时间: 2011年12月2日 4:20
>> > > > 收件人: [email protected]
>> > > > 抄送: Ramakrishna s vasudevan; [email protected]
>> > > > 主题: Re: Suspected memory leak
>> > > >
>> > > > Adding to the excellent write-up by Jonathan:
>> > > > Since finalizer is involved, it takes two GC cycles to collect them.
>> >  Due
>> > > > to a bug/bugs in the CMS GC, collection may not happen and the heap
>> can
>> > > > grow really big.  See
>> > > > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for
>> > details.
>> > > >
>> > > > Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the
>> > > socket
>> > > > related objects were being collected properly. This option forces the
>> > > > concurrent marker to be one thread. This was for HDFS, but I think
>> the
>> > > same
>> > > > applies here.
>> > > >
>> > > > Kihwal
>> > > >
>> > > > On 12/1/11 1:26 PM, "Stack" <[email protected]> wrote:
>> > > >
>> > > > Make sure its not the issue that Jonathan Payne identifiied a while
>> > > > back:
>> > > >
>> > >
>> >
>> https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
>> > > > St.Ack
>> > > >
>> > > >
>> > >
>> > >
>> >
>>

Re: FeedbackRe: Suspected memory leak

Reply via email to