huaxiang sun created HBASE-17889:
------------------------------------
Summary: ResultBoundedCompletionService's cancel() needs to
interrupt the working thread and free it to the thread-pool
Key: HBASE-17889
URL: https://issues.apache.org/jira/browse/HBASE-17889
Project: HBase
Issue Type: Bug
Components: Client
Affects Versions: 2.0.0, 1.4.0, 1.2.6, 1.3.2
Reporter: huaxiang sun
Assignee: huaxiang sun
We run into one case with read-replica, when the server hosting the primary
region is shutdown, we see Get did not go to replica region and it paused for
about 50 seconds before Get was resumed.
More debugging finds out that when the server is down, one of the threads was
stuck at the write, it holds lock at
https://github.com/apache/hbase/blob/branch-1.3/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClientImpl.java#L916.
The later write threads were waiting on this lock until all threads in the
connection's thread pool were stuck on this lock. At that moment, no work will
be done. After socket write times out, it frees up all threads and it continues.
When QueueingFuture#cancel() is called, it does not interrupt the working
thread and return the thread to the pool.
Attaching the jstack trace.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)