[ 
https://issues.apache.org/jira/browse/HBASE-26088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382327#comment-17382327
 ] 

Michael Stack commented on HBASE-26088:
---------------------------------------

I like the two line removal fix for branch-2. The default BM will create a pool 
and clean it up if passed a null. Not sure how many other BM implementations 
there are but a fat release note on a 2.5.0 release about change in param 
passed to BM constructor should cover us. For 2.4 and 2.3, the workaround 
suggested above I'd say. For 3.0.0, we should share the connection executor as 
the javadoc says. Good find [~whitney13]

> conn.getBufferedMutator(tableName) leaks thread executors and other problems
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-26088
>                 URL: https://issues.apache.org/jira/browse/HBASE-26088
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 1.4.13, 2.4.4
>            Reporter: Whitney Jackson
>            Priority: Critical
>
> TL;DR: {{conn.getBufferedMutator(tableName)}} is dangerous in hbase client 
> 2.4.4 and doesn't match documented behavior in 1.4.13.
> To work around the problems until fixed do this:
> {code:java}
> var mySingletonPool = HTable.getDefaultExecutor(hbaseConf);
> var params = new BufferedMutatorParams(tableName);
> params.pool(mySingletonPool);
> var myMutator = conn.getBufferedMutator(params);
> {code}
> And avoid code like this:
> {code:java}
> var myMutator = conn.getBufferedMutator(tableName);
> {code}
> The full story:
> My application started leaking threads after upgrading from hbase client 
> 1.4.13 to 2.4.4. So much so that after less than a minute of runtime more 
> that 30k threads are leaked and all available virtual memory on the box (> 50 
> GB) is consumed. Other processes on the box start crashing with memory 
> allocation errors. Even running {{ls}} at the shell fails with OS resource 
> allocation failures.
> A thread dump after just a few seconds of runtime shows thousands of threads 
> like this:
> {code:java}
> "htable-pool-0" #8841 prio=5 os_prio=0 cpu=0.15ms elapsed=7.49s 
> tid=0x00007efb6d2a1000 nid=0x57d2 waiting on condition [0x00007ef8a6c38000]
>  java.lang.Thread.State: TIMED_WAITING (parking)
>  at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
>  - parking to wait for <0x00000007e7cd6188> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>  at 
> java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6/LockSupport.java:234)
>  at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.base@11.0.6/SynchronousQueue.java:462)
>  at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.base@11.0.6/SynchronousQueue.java:361)
>  at 
> java.util.concurrent.SynchronousQueue.poll(java.base@11.0.6/SynchronousQueue.java:937)
>  at 
> java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.6/ThreadPoolExecutor.java:1053)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.6/ThreadPoolExecutor.java:1114)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.6/ThreadPoolExecutor.java:628)
>  at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
> {code}
>  
> Note: All the threads are labeled {{htable-pool-0}}. That suggests we're 
> leaking thread executors not just threads. The {{htable-pool}} part indicates 
> the problem is to do with {{HTable.getDefaultExecutor(conf)}} and the only 
> part of my code that interacts with that is a call to 
> {{conn.getBufferedMutator(tableName)}}.
>  
> Looking at the hbase client code shows a few problems:
> 1) Neither 1.4.13 nor 2.4.4's behavior matches the documentation for 
> {{conn.getBufferedMutator(tableName)}} which says:
> {quote}This BufferedMutator will use the Connection's ExecutorService.
> {quote}
> That suggests some singleton thread executor is being used which is not the 
> case.
>  
> 2) Under 1.4.13 you get a new {{ThreadPoolExecutor}} for every 
> {{BufferedMutator}}. That's probably not what you want but you likely won't 
> notice. I didn't. It's a code path I hadn't profiled much.
>  
> 3) Under 2.4.4 you get a new {{ThreadPoolExecutor}} for every 
> {{BufferedMutator}} *and* that {{ThreadPoolExecutor}} *is not* cleaned up 
> after the {{Mutator}} is closed. Each completed {{ThreadPoolExecutor}} 
> carries with it one thread which hangs around until a timeout value which 
> defaults to 60 seconds.
> My application creates one {{BufferedMutator}} for every incoming stream and 
> there are lots of streams, some of them are short lived so my code leaks 
> threads fast under 2.4.4.
> Here's the part where a new executor is created for every {{BufferedMutator}} 
> (it's similar for 1.4.13):
> [https://github.com/apache/hbase/blob/branch-2.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L420]
>  
> The reason for the leak in 2.4.4 is the should-we/shouldn't-we cleanup logic 
> added here:
> [https://github.com/apache/hbase/blob/branch-2.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/BufferedMutatorImpl.java#L104]
> That might be ok if {{pool}} was being initialized there but in the 
> {{conn.getBufferedMutator(tableName)}} code path it's not. {{pool}} is 
> initialized in {{conn.getBufferedMutator}} itself so the executor cleanup 
> code never runs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to