[ 
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739049#comment-14739049
 ] 

Ted Yu commented on HBASE-14370:
--------------------------------

bq. just does log warn that "Something wrong with the TableAuthManager 

The following change in the same if block would take action on top of warning:
{code}
+      instance.getZKPermissionWatcher().getWatcher().abort(msg, null);
{code}
bq. but then declares a ' private Runnable

The private runnable allows subsequent nodeChildrenChanged event to preempt 
current processing of previous nodeChildrenChanged event. The rationale is that 
there is no need to continue processing potentially stale data.
Would renaming the private runnable (e.g. nodeChildrenChangedRunnable) make the 
code more readable ?

As of patch v7, the order of handling zk notifications is strictly the same as 
current formation.

As stated earlier, the customer's use case constantly creates new tables. As of 
last Friday, there were ~2600 tables. I wouldn't be surprised if the table 
count reaches 3000.
Efficiently handling zk notifications becomes important such that the 
notifications for region assignment are not blocked by the handling for ACL.

> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> ------------------------------------------------------------------
>
>                 Key: HBASE-14370
>                 URL: https://issues.apache.org/jira/browse/HBASE-14370
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.0
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt, 
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the 
> following:
> {code}
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
>   at 
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635, 
> leading to slowness in table creation because zk notification for region 
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to