[
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739049#comment-14739049
]
Ted Yu commented on HBASE-14370:
--------------------------------
bq. just does log warn that "Something wrong with the TableAuthManager
The following change in the same if block would take action on top of warning:
{code}
+ instance.getZKPermissionWatcher().getWatcher().abort(msg, null);
{code}
bq. but then declares a ' private Runnable
The private runnable allows subsequent nodeChildrenChanged event to preempt
current processing of previous nodeChildrenChanged event. The rationale is that
there is no need to continue processing potentially stale data.
Would renaming the private runnable (e.g. nodeChildrenChangedRunnable) make the
code more readable ?
As of patch v7, the order of handling zk notifications is strictly the same as
current formation.
As stated earlier, the customer's use case constantly creates new tables. As of
last Friday, there were ~2600 tables. I wouldn't be surprised if the table
count reaches 3000.
Efficiently handling zk notifications becomes important such that the
notifications for region assignment are not blocked by the handling for ACL.
> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> ------------------------------------------------------------------
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.98.0
> Reporter: Ted Yu
> Assignee: Ted Yu
> Attachments: 14370-v1.txt, 14370-v3.txt, 14370-v5.txt, 14370-v7.txt,
> 14370-wait-nofity-v2.txt, 14370-wait-nofity.txt, hbase-14370_v4.patch
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the
> following:
> {code}
> at
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
> at
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
> at
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
> at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635,
> leading to slowness in table creation because zk notification for region
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)