Whitney Jackson created HBASE-26088:
---------------------------------------
Summary: conn.getBufferedMutator(tableName) leaks thread executors
and other problems
Key: HBASE-26088
URL: https://issues.apache.org/jira/browse/HBASE-26088
Project: HBase
Issue Type: Bug
Components: Client
Affects Versions: 2.4.4, 1.4.13
Reporter: Whitney Jackson
TL;DR: {{conn.getBufferedMutator(tableName)}} is dangerous in hbase client
2.4.4 and doesn't match documented behavior in 1.4.13.
To work around the problems until fixed do this:
{code:java}
var mySingletonPool = HTable.getDefaultExecutor(hbaseConf);
var params = new BufferedMutatorParams(tableName);
params.pool(mySingletonPool);
var myMutator = conn.getBufferedMutator(params);
{code}
And avoid code like this:
{code:java}
var myMutator = conn.getBufferedMutator(tableName);
{code}
The full story:
My application started leaking threads after upgrading from hbase client 1.4.13
to 2.4.4. So much so that after less than a minute of runtime more that 30k
threads are leaked and all available virtual memory on the box (> 50 GB) is
consumed. Other processes on the box start crashing with memory allocation
errors. Even running {{ls}} at the shell fails with OS resource allocation
failures.
A thread dump after just a few seconds of runtime shows thousands of threads
like this:
{code:java}
"htable-pool-0" #8841 prio=5 os_prio=0 cpu=0.15ms elapsed=7.49s
tid=0x00007efb6d2a1000 nid=0x57d2 waiting on condition [0x00007ef8a6c38000]
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
- parking to wait for <0x00000007e7cd6188> (a
java.util.concurrent.SynchronousQueue$TransferStack)
at
java.util.concurrent.locks.LockSupport.parkNanos([email protected]/LockSupport.java:234)
at
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill([email protected]/SynchronousQueue.java:462)
at
java.util.concurrent.SynchronousQueue$TransferStack.transfer([email protected]/SynchronousQueue.java:361)
at
java.util.concurrent.SynchronousQueue.poll([email protected]/SynchronousQueue.java:937)
at
java.util.concurrent.ThreadPoolExecutor.getTask([email protected]/ThreadPoolExecutor.java:1053)
at
java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1114)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
at java.lang.Thread.run([email protected]/Thread.java:834)
{code}
Note: All the threads are labeled {{htable-pool-0}}. That suggests we're
leaking thread executors not just threads. The {{htable-pool}} part indicates
the problem is to do with {{HTable.getDefaultExecutor(conf)}} and the only part
of my code that interacts with that is a call to
{{conn.getBufferedMutator(tableName)}}.
Looking at the hbase client code shows a few problems:
1) Neither 1.4.13 nor 2.4.4's behavior matches the documentation for
{{conn.getBufferedMutator(tableName)}} which says:
{quote}This BufferedMutator will use the Connection's ExecutorService.
{quote}
That suggests some singleton thread executor is being used which is not the
case.
2) Under 1.4.13 you get a new {{ThreadPoolExecutor}} for every
{{BufferedMutator}}. That's probably not what you want but you likely won't
notice. I didn't. It's a code path I hadn't profiled much.
3) Under 2.4.4 you get a new {{ThreadPoolExecutor}} for every
{{BufferedMutator}} *and* that {{ThreadPoolExecutor}} *is not* cleaned up after
the {{Mutator}} is closed. Each completed {{ThreadPoolExecutor}} carries with
it one thread which hangs around until a timeout value which defaults to 60
seconds.
My application creates one {{BufferedMutator}} for every incoming stream and
there are lots of streams, some of them are short lived so my code leaks
threads fast under 2.4.4.
Here's the part where a new executor is created for every {{BufferedMutator}}
(it's similar for 1.4.13):
[https://github.com/apache/hbase/blob/branch-2.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L420]
The reason for the leak in 2.4.4 is the should-we/shouldn't-we cleanup logic
added here:
[https://github.com/apache/hbase/blob/branch-2.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/BufferedMutatorImpl.java#L104]
That might be ok if {{pool}} was being initialized there but in the
{{conn.getBufferedMutator(tableName)}} code path it's not. {{pool}} is
initialized in {{conn.getBufferedMutator}} itself so the executor cleanup code
never runs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)