[
https://issues.apache.org/jira/browse/HBASE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052475#comment-14052475
]
Hadoop QA commented on HBASE-11460:
-----------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12653968/11460-v1.txt
against trunk revision .
ATTACHMENT ID: 12653968
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.
{color:red}-1 findbugs{color}. The patch appears to introduce 4 new
Findbugs (version 1.3.9) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100
{color:green}+1 site{color}. The mvn site goal succeeds with this patch.
{color:green}+1 core tests{color}. The patch passed unit tests in .
Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//console
This message is automatically generated.
> Deadlock in HMaster on masterAndZKLock in HConnectionManager
> ------------------------------------------------------------
>
> Key: HBASE-11460
> URL: https://issues.apache.org/jira/browse/HBASE-11460
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.96.0
> Reporter: Andrey Stepachev
> Assignee: Ted Yu
> Priority: Critical
> Fix For: 0.99.0
>
> Attachments: 11460-v1.txt, threads.tdump
>
>
> On one of our clusters we got a deadlock in HMaster.
> In a nutshell deadlock caused by using one HConnectionManager for serving
> client-like calls and calls from HMaster RPC handlers.
> HBaseAdmin uses HConnectionManager which takes a lock masterAndZKLock.
> On the other side of this game sits TablesNamespaceManager (TNM). This class
> uses HConnectionManager too (in my case for getting list of available
> namespaces).
> Problem is that HMaster class uses TNM for serving RPC requests.
> If we look at TNM more closely, we can see, that this class is totally
> synchronised.
> Thats gives us a problem.
> WebInterface calls request via HConnectionManager and locks masterAndZKLock.
> Connection is blocking, so RpcClient will spin, awaiting for reply (while
> holding lock).
> That how it looks like in thread dump:
> {code}
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000000c8905430> (a
> org.apache.hadoop.hbase.ipc.RpcClient$Call)
> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435)
> - locked <0x00000000c8905430> (a
> org.apache.hadoop.hbase.ipc.RpcClient$Call)
> at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
> at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
> at
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1467)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2093)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1819)
> - locked <0x00000000d15dc668> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin$MasterCallable.prepare(HBaseAdmin.java:3187)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:119)
> - locked <0x00000000cd0c1238> (a
> org.apache.hadoop.hbase.client.RpcRetryingCaller)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:96)
> - locked <0x00000000cd0c1238> (a
> org.apache.hadoop.hbase.client.RpcRetryingCaller)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3214)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.listTableDescriptorsByNamespace(HBaseAdmin.java:2265)
> {code}
> Some other client call any HMaster RPC, and it calls TablesNamespaceManager
> methods, which in turn will block on HConnectionManager global lock
> masterAndZKLock.
> That how it looks like:
> {code}
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveZooKeeperWatcher(HConnectionManager.java:1699)
> - waiting to lock <0x00000000d15dc668> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.client.ZooKeeperRegistry.isTableOnlineState(ZooKeeperRegistry.java:100)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isTableDisabled(HConnectionManager.java:874)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:1027)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:852)
> at
> org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:72)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:119)
> - locked <0x00000000cd0ef108> (a
> org.apache.hadoop.hbase.client.RpcRetryingCaller)
> at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:705)
> at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:144)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1102)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1162)
> - locked <0x00000000d1b49fd8> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1054)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1011)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:852)
> at
> org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:72)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:119)
> - locked <0x00000000cd0ef248> (a
> org.apache.hadoop.hbase.client.RpcRetryingCaller)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:756)
> at
> org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:134)
> at
> org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:118)
> - locked <0x00000000d189da20> (a
> org.apache.hadoop.hbase.master.TableNamespaceManager)
> at
> org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3113)
> at
> org.apache.hadoop.hbase.master.HMaster.listTableDescriptorsByNamespace(HMaster.java:3133)
> at
> org.apache.hadoop.hbase.master.HMaster.listTableDescriptorsByNamespace(HMaster.java:3034)
> at
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38261)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
> {code}
> And finally original handler, which should serve request from WebGUI can be
> blocked on TNM methods effectively forming dead lock.
--
This message was sent by Atlassian JIRA
(v6.2#6252)