[
https://issues.apache.org/jira/browse/HBASE-13351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494848#comment-14494848
]
Josh Elser commented on HBASE-13351:
------------------------------------
Ah! I think I got to the bottom of why this deadlocks without sufficient
priority-pool threads. {{MasterRpcServices#reportRegionStateTransition}}
ultimately makes another {{Get}} to meta which automatically gets put at
priority 200 (because it's a request against meta).
So, the region server fires off reportRegionStateTransition calls to the
Master, these end up going back into the same thread pool which has no more
threads to handle the requests. Boom, deadlock. The confusing part (or at least
the part I don't understand) is why this is going back to the Master and not a
RS. Maybe it's due to the Master acting as a RS? Maybe I just don't understand
how this works completely :)
{noformat}
Daemon Thread [PriorityRpcServer.handler=1,queue=1,port=64100] (Suspended)
waiting for: AsyncCall (id=891)
Object.wait(long) line: not available [native method]
AsyncCall(Object).wait(long, int) line: 461
AsyncCall(DefaultPromise<V>).await0(long, boolean) line: 355
AsyncCall(DefaultPromise<V>).await(long, TimeUnit) line: 266
AsyncCall(AbstractFuture<V>).get(long, TimeUnit) line: 42
AsyncRpcClient.call(PayloadCarryingRpcController, MethodDescriptor,
Message, Message, User, InetSocketAddress) line: 226
AsyncRpcClient(AbstractRpcClient).callBlockingMethod(Descriptors$MethodDescriptor,
PayloadCarryingRpcController, Message, Message, User, InetSocketAddress) line:
213
AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(Descriptors$MethodDescriptor,
RpcController, Message, Message) line: 287
ClientProtos$ClientService$BlockingStub.get(RpcController,
ClientProtos$GetRequest) line: 32391
HTable$3.call(int) line: 686
HTable$3.call(int) line: 1
RpcRetryingCallerImpl<T>.callWithRetries(RetryingCallable<T>, int)
line: 117
HTable.get(Get) line: 694
MetaTableAccessor.getTableState(Connection, TableName) line: 1075
TableStateManager.readMetaState(TableName) line: 187
TableStateManager.getTableState(TableName) line: 171
TableStateManager.isTableState(TableName, TableState$State...) line:
130
AssignmentManager.onRegionOpen(RegionState, HRegionInfo, ServerName,
RegionServerStatusProtos$RegionStateTransition) line: 2183
AssignmentManager.onRegionTransition(ServerName,
RegionServerStatusProtos$RegionStateTransition) line: 2754
MasterRpcServices.reportRegionStateTransition(RpcController,
RegionServerStatusProtos$ReportRegionStateTransitionRequest) line: 1264
RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(Descriptors$MethodDescriptor,
RpcController, Message) line: 8623
RpcServer.call(BlockingService, MethodDescriptor, Message, CellScanner,
long, MonitoredRPCHandler) line: 2095
CallRunner.run() line: 101
BalancedQueueRpcExecutor(RpcExecutor).consumerLoop(BlockingQueue<CallRunner>)
line: 130
RpcExecutor$2.run() line: 107
Thread.run() line: 745
Daemon Thread [PostOpenDeployTasks:d923ab785d95578230ec49fbb1f40e8e]
(Suspended)
waiting for: AsyncCall (id=808)
Object.wait(long) line: not available [native method]
AsyncCall(Object).wait(long, int) line: 461
AsyncCall(DefaultPromise<V>).await0(long, boolean) line: 355
AsyncCall(DefaultPromise<V>).await(long, TimeUnit) line: 266
AsyncCall(AbstractFuture<V>).get(long, TimeUnit) line: 42
AsyncRpcClient.call(PayloadCarryingRpcController, MethodDescriptor,
Message, Message, User, InetSocketAddress) line: 226
AsyncRpcClient(AbstractRpcClient).callBlockingMethod(Descriptors$MethodDescriptor,
PayloadCarryingRpcController, Message, Message, User, InetSocketAddress) line:
213
AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(Descriptors$MethodDescriptor,
RpcController, Message, Message) line: 287
RegionServerStatusProtos$RegionServerStatusService$BlockingStub.reportRegionStateTransition(RpcController,
RegionServerStatusProtos$ReportRegionStateTransitionRequest) line: 9030
MiniHBaseCluster$MiniHBaseClusterRegionServer(HRegionServer).reportRegionStateTransition(RegionServerStatusProtos$RegionStateTransition$TransitionCode,
long, HRegionInfo...) line: 1949
MiniHBaseCluster$MiniHBaseClusterRegionServer(HRegionServer).postOpenDeployTasks(Region)
line: 1884
OpenRegionHandler$PostOpenDeployTasksThread.run() line: 241
{noformat}
> Annotate internal MasterRpcServices methods with admin priority
> ---------------------------------------------------------------
>
> Key: HBASE-13351
> URL: https://issues.apache.org/jira/browse/HBASE-13351
> Project: HBase
> Issue Type: Improvement
> Components: master
> Reporter: Josh Elser
> Assignee: Josh Elser
> Fix For: 2.0.0, 1.1.0
>
> Attachments: HBASE-13351-v1.patch, HBASE-13351-v2.patch,
> HBASE-13351-v3.patch, HBASE-13351.patch
>
>
> HBASE-12071, among other things, introduced annotating RPC methods to give
> certain methods priority over others. Namely, this helps ensure that client
> requests cannot starve out internal RPC between master and regionserver.
> Similarly, we can do the same thing for Master RPC methods that are invoked
> by RS's.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)