[
https://issues.apache.org/jira/browse/HBASE-17889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961605#comment-15961605
]
huaxiang sun commented on HBASE-17889:
--------------------------------------
Running against the master, it does not seem block. Looking into what is the
difference, will report back.
> ResultBoundedCompletionService's cancel() needs to interrupt the working
> thread and free it to the thread-pool
> --------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-17889
> URL: https://issues.apache.org/jira/browse/HBASE-17889
> Project: HBase
> Issue Type: Bug
> Components: Client
> Affects Versions: 2.0.0, 1.4.0, 1.2.6, 1.3.2
> Reporter: huaxiang sun
> Assignee: huaxiang sun
> Attachments: HBASE-17889-master-001.patch, jstack.txt
>
>
> We run into one case with read-replica, when the server hosting the primary
> region is shutdown, we see Get did not go to replica region and it paused for
> about 50 seconds before Get was resumed.
> More debugging finds out that when the server is down, one of the threads was
> stuck at the write, it holds lock at
> https://github.com/apache/hbase/blob/branch-1.3/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClientImpl.java#L916.
> The later write threads were waiting on this lock until all threads in the
> connection's thread pool were stuck on this lock. At that moment, no work
> will be done. After socket write times out, it frees up all threads and it
> continues.
> When QueueingFuture#cancel() is called, it does not interrupt the working
> thread and return the thread to the pool.
> Attaching the jstack trace.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)