[
https://issues.apache.org/jira/browse/HBASE-12028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272330#comment-14272330
]
Hudson commented on HBASE-12028:
--------------------------------
SUCCESS: Integrated in HBase-0.98 #785 (See
[https://builds.apache.org/job/HBase-0.98/785/])
HBASE-12787 Backport HBASE-12028 (Abort the RegionServer when it's handler
threads die) to 0.98 (Alicia Ying Shu) (apurtell: rev
b4b1b9c46308747b14620d1010526562a3fc4ff5)
*
hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestSimpleRpcScheduler.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcScheduler.java
* hbase-common/src/main/resources/hbase-default.xml
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcExecutor.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RWQueueRpcExecutor.java
*
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SimpleRpcSchedulerFactory.java
*
hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/BalancedQueueRpcExecutor.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
> Abort the RegionServer, when it's handler threads die
> -----------------------------------------------------
>
> Key: HBASE-12028
> URL: https://issues.apache.org/jira/browse/HBASE-12028
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: Sudarshan Kadambi
> Assignee: Alicia Ying Shu
> Fix For: 1.0.0, 2.0.0, 1.1.0
>
> Attachments: Hbase-12028-v3.patch, Hbase-12028.patch,
> hbase-12028-v4.patch, hbase-12028-v5-branch-1.patch,
> hbase-12028-v5-master.patch, hbase-12028-v5.patch
>
>
> Over in HBase-11813, a user identified an issue where in all the RPC handler
> threads would exit with StackOverflow errors due to an unchecked
> recursion-terminating condition. Our clusters demonstrated the same trace.
> While the patch posted for HBASE-11813 got our clusters to be merry again,
> the breakdown surfaced some larger issues.
> When the RegionServer had all it's RPC handler threads dead, it continued to
> have regions assigned it. Clearly, it wouldn't be able to serve reads and
> writes on those regions. A second issue was that when a user tried to disable
> or drop a table, the master would try to communicate to the regionserver for
> region unassignment. Since the same handler threads seem to be used for
> master <-> RS communication as well, the master ended up hanging on the RS
> indefinitely. Eventually, the master stopped responding to all table
> meta-operations.
> A handler thread should never exit, and if it does, it seems like the more
> prudent thing to do would be for the RS to abort. This way, at least recovery
> can be undertaken and the regions could be reassigned elsewhere. I also think
> that the master<->RS communication should get its own exclusive threadpool,
> but I'll wait until this issue has been sufficiently discussed before opening
> an issue ticket for that.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)