Whitney Jackson created HBASE-26088:
---------------------------------------

             Summary: conn.getBufferedMutator(tableName) leaks thread executors 
and other problems
                 Key: HBASE-26088
                 URL: https://issues.apache.org/jira/browse/HBASE-26088
             Project: HBase
          Issue Type: Bug
          Components: Client
    Affects Versions: 2.4.4, 1.4.13
            Reporter: Whitney Jackson


TL;DR: {{conn.getBufferedMutator(tableName)}} is dangerous in hbase client 
2.4.4 and doesn't match documented behavior in 1.4.13.

To work around the problems until fixed do this:
{code:java}
var mySingletonPool = HTable.getDefaultExecutor(hbaseConf);
var params = new BufferedMutatorParams(tableName);
params.pool(mySingletonPool);
var myMutator = conn.getBufferedMutator(params);
{code}
And avoid code like this:
{code:java}
var myMutator = conn.getBufferedMutator(tableName);
{code}
The full story:

My application started leaking threads after upgrading from hbase client 1.4.13 
to 2.4.4. So much so that after less than a minute of runtime more that 30k 
threads are leaked and all available virtual memory on the box (> 50 GB) is 
consumed. Other processes on the box start crashing with memory allocation 
errors. Even running {{ls}} at the shell fails with OS resource allocation 
failures.

A thread dump after just a few seconds of runtime shows thousands of threads 
like this:
{code:java}
"htable-pool-0" #8841 prio=5 os_prio=0 cpu=0.15ms elapsed=7.49s 
tid=0x00007efb6d2a1000 nid=0x57d2 waiting on condition [0x00007ef8a6c38000]
 java.lang.Thread.State: TIMED_WAITING (parking)
 at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
 - parking to wait for <0x00000007e7cd6188> (a 
java.util.concurrent.SynchronousQueue$TransferStack)
 at 
java.util.concurrent.locks.LockSupport.parkNanos([email protected]/LockSupport.java:234)
 at 
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill([email protected]/SynchronousQueue.java:462)
 at 
java.util.concurrent.SynchronousQueue$TransferStack.transfer([email protected]/SynchronousQueue.java:361)
 at 
java.util.concurrent.SynchronousQueue.poll([email protected]/SynchronousQueue.java:937)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask([email protected]/ThreadPoolExecutor.java:1053)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1114)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
 at java.lang.Thread.run([email protected]/Thread.java:834)
{code}
 

Note: All the threads are labeled {{htable-pool-0}}. That suggests we're 
leaking thread executors not just threads. The {{htable-pool}} part indicates 
the problem is to do with {{HTable.getDefaultExecutor(conf)}} and the only part 
of my code that interacts with that is a call to 
{{conn.getBufferedMutator(tableName)}}.

 

Looking at the hbase client code shows a few problems:

1) Neither 1.4.13 nor 2.4.4's behavior matches the documentation for 
{{conn.getBufferedMutator(tableName)}} which says:
{quote}This BufferedMutator will use the Connection's ExecutorService.
{quote}
That suggests some singleton thread executor is being used which is not the 
case.

 

2) Under 1.4.13 you get a new {{ThreadPoolExecutor}} for every 
{{BufferedMutator}}. That's probably not what you want but you likely won't 
notice. I didn't. It's a code path I hadn't profiled much.

 

3) Under 2.4.4 you get a new {{ThreadPoolExecutor}} for every 
{{BufferedMutator}} *and* that {{ThreadPoolExecutor}} *is not* cleaned up after 
the {{Mutator}} is closed. Each completed {{ThreadPoolExecutor}} carries with 
it one thread which hangs around until a timeout value which defaults to 60 
seconds.

My application creates one {{BufferedMutator}} for every incoming stream and 
there are lots of streams, some of them are short lived so my code leaks 
threads fast under 2.4.4.

Here's the part where a new executor is created for every {{BufferedMutator}} 
(it's similar for 1.4.13):

[https://github.com/apache/hbase/blob/branch-2.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L420]

 

The reason for the leak in 2.4.4 is the should-we/shouldn't-we cleanup logic 
added here:

[https://github.com/apache/hbase/blob/branch-2.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/BufferedMutatorImpl.java#L104]

That might be ok if {{pool}} was being initialized there but in the 
{{conn.getBufferedMutator(tableName)}} code path it's not. {{pool}} is 
initialized in {{conn.getBufferedMutator}} itself so the executor cleanup code 
never runs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to