[
https://issues.apache.org/jira/browse/HBASE-17889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960240#comment-15960240
]
stack commented on HBASE-17889:
-------------------------------
[~enis] FYI. Nice find by [~huaxiang] debugging and replicating a nasty hangup
in read replicas.
> ResultBoundedCompletionService's cancel() needs to interrupt the working
> thread and free it to the thread-pool
> --------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-17889
> URL: https://issues.apache.org/jira/browse/HBASE-17889
> Project: HBase
> Issue Type: Bug
> Components: Client
> Affects Versions: 2.0.0, 1.4.0, 1.2.6, 1.3.2
> Reporter: huaxiang sun
> Assignee: huaxiang sun
> Attachments: HBASE-17889-master-001.patch, jstack.txt
>
>
> We run into one case with read-replica, when the server hosting the primary
> region is shutdown, we see Get did not go to replica region and it paused for
> about 50 seconds before Get was resumed.
> More debugging finds out that when the server is down, one of the threads was
> stuck at the write, it holds lock at
> https://github.com/apache/hbase/blob/branch-1.3/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClientImpl.java#L916.
> The later write threads were waiting on this lock until all threads in the
> connection's thread pool were stuck on this lock. At that moment, no work
> will be done. After socket write times out, it frees up all threads and it
> continues.
> When QueueingFuture#cancel() is called, it does not interrupt the working
> thread and return the thread to the pool.
> Attaching the jstack trace.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)