[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908054#comment-14908054
 ] 

Ted Yu commented on HBASE-14370:
--------------------------------

{code}
fht https://builds.apache.org/job/PreCommit-HBASE-Build/15739/consoleFull
Fetching the console output from the URL
Printing hanging tests
Hanging test : 
org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization
Printing Failing tests
{code}
>From 
>https://builds.apache.org/job/HBase-1.3/jdk=latest1.7,label=Hadoop/203/consoleFull
> :
{code}
"main" prio=10 tid=0x00007fa16c00a800 nid=0x4e8 waiting on condition 
[0x00007fa173269000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1339)
        at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1159)
        at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1130)
        at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1114)
        at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:935)
        at 
org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83)
        at 
org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:79)
        at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124)
        at 
org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:95)
        at 
org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callBlockingMethod(CoprocessorRpcChannel.java:73)
        at 
org.apache.hadoop.hbase.protobuf.generated.AccessControlProtos$AccessControlService$BlockingStub.grant(AccessControlProtos.java:10280)
        at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.grant(ProtobufUtil.java:2209)
        at 
org.apache.hadoop.hbase.security.access.SecureTestUtil$8.call(SecureTestUtil.java:502)
        at 
org.apache.hadoop.hbase.security.access.SecureTestUtil$8.call(SecureTestUtil.java:494)
        at 
org.apache.hadoop.hbase.security.access.SecureTestUtil.updateACLs(SecureTestUtil.java:324)
        at 
org.apache.hadoop.hbase.security.access.SecureTestUtil.grantOnTable(SecureTestUtil.java:494)
        at 
org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization.setUp(TestWithDisabledAuthorization.java:203)
{code}
So it was not regression.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> ------------------------------------------------------------------
>
>                 Key: HBASE-14370
>                 URL: https://issues.apache.org/jira/browse/HBASE-14370
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.0
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>             Fix For: 2.0.0, 0.98.15
>
>         Attachments: 14370-0.98-v10.txt, 14370-branch-1-v10.txt, 
> 14370-branch-1-v10.txt, 14370-branch-1-v10.txt, 14370-v1.txt, 14370-v10.txt, 
> 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 14370-v8.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch, 
> test-acl3-branch-1.stack
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to