[ 
https://issues.apache.org/jira/browse/HBASE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052475#comment-14052475
 ] 

Hadoop QA commented on HBASE-11460:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653968/11460-v1.txt
  against trunk revision .
  ATTACHMENT ID: 12653968

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 4 new 
Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9970//console

This message is automatically generated.

> Deadlock in HMaster on masterAndZKLock in HConnectionManager
> ------------------------------------------------------------
>
>                 Key: HBASE-11460
>                 URL: https://issues.apache.org/jira/browse/HBASE-11460
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: Andrey Stepachev
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.99.0
>
>         Attachments: 11460-v1.txt, threads.tdump
>
>
> On one of our clusters we got a deadlock in HMaster.
> In a nutshell deadlock caused by using one HConnectionManager for serving 
> client-like calls and calls from HMaster RPC handlers.
> HBaseAdmin uses HConnectionManager which takes a lock masterAndZKLock.
> On the other side of this game sits TablesNamespaceManager (TNM). This class 
> uses HConnectionManager too (in my case for getting list of available 
> namespaces). 
> Problem is that HMaster class uses TNM  for serving RPC requests.
> If we look at TNM more closely, we can see, that this class is totally 
> synchronised.
> Thats gives us a problem.
> WebInterface calls request via HConnectionManager and locks masterAndZKLock.
> Connection is blocking, so RpcClient will spin, awaiting for reply (while 
> holding lock).
> That how it looks like in thread dump:
> {code}
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       - waiting on <0x00000000c8905430> (a 
> org.apache.hadoop.hbase.ipc.RpcClient$Call)
>       at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435)
>       - locked <0x00000000c8905430> (a 
> org.apache.hadoop.hbase.ipc.RpcClient$Call)
>       at 
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
>       at 
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
>       at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1467)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2093)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1819)
>       - locked <0x00000000d15dc668> (a java.lang.Object)
>       at 
> org.apache.hadoop.hbase.client.HBaseAdmin$MasterCallable.prepare(HBaseAdmin.java:3187)
>       at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:119)
>       - locked <0x00000000cd0c1238> (a 
> org.apache.hadoop.hbase.client.RpcRetryingCaller)
>       at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:96)
>       - locked <0x00000000cd0c1238> (a 
> org.apache.hadoop.hbase.client.RpcRetryingCaller)
>       at 
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3214)
>       at 
> org.apache.hadoop.hbase.client.HBaseAdmin.listTableDescriptorsByNamespace(HBaseAdmin.java:2265)
> {code}
> Some other client call any HMaster RPC, and it calls TablesNamespaceManager 
> methods, which in turn will block on HConnectionManager global lock 
> masterAndZKLock.
> That how it looks like:
> {code}
>   java.lang.Thread.State: BLOCKED (on object monitor)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveZooKeeperWatcher(HConnectionManager.java:1699)
>       - waiting to lock <0x00000000d15dc668> (a java.lang.Object)
>       at 
> org.apache.hadoop.hbase.client.ZooKeeperRegistry.isTableOnlineState(ZooKeeperRegistry.java:100)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isTableDisabled(HConnectionManager.java:874)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:1027)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:852)
>       at 
> org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:72)
>       at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:119)
>       - locked <0x00000000cd0ef108> (a 
> org.apache.hadoop.hbase.client.RpcRetryingCaller)
>       at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:705)
>       at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:144)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1102)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1162)
>       - locked <0x00000000d1b49fd8> (a java.lang.Object)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1054)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1011)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:852)
>       at 
> org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:72)
>       at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:119)
>       - locked <0x00000000cd0ef248> (a 
> org.apache.hadoop.hbase.client.RpcRetryingCaller)
>       at org.apache.hadoop.hbase.client.HTable.get(HTable.java:756)
>       at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:134)
>       at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:118)
>       - locked <0x00000000d189da20> (a 
> org.apache.hadoop.hbase.master.TableNamespaceManager)
>       at 
> org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3113)
>       at 
> org.apache.hadoop.hbase.master.HMaster.listTableDescriptorsByNamespace(HMaster.java:3133)
>       at 
> org.apache.hadoop.hbase.master.HMaster.listTableDescriptorsByNamespace(HMaster.java:3034)
>       at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38261)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
>       at 
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
> {code}
> And finally original handler, which should serve request from WebGUI can be 
> blocked on TNM methods effectively forming dead lock.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to