[
https://issues.apache.org/jira/browse/HBASE-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732066#comment-14732066
]
Ted Yu commented on HBASE-14370:
--------------------------------
Using the main zk thread for refreshing AuthManager, in my opinion, is not a
scalable design.
The customer cluster has well over 2000 tables. A workflow constantly creates
new tables.
The time for running ZKPermissionWatcher#refreshNodes() is not negligible.
We should free the main zk thread for processing other zookeeper notifications.
> Use separate thread for calling ZKPermissionWatcher#refreshNodes()
> ------------------------------------------------------------------
>
> Key: HBASE-14370
> URL: https://issues.apache.org/jira/browse/HBASE-14370
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.98.0
> Reporter: Ted Yu
> Assignee: Ted Yu
> Attachments: 14370-v1.txt
>
>
> I came off a support case (0.98.0) where main zk thread was seen doing the
> following:
> {code}
> at
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager(ZKPermissionWatcher.java:152)
> at
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes(ZKPermissionWatcher.java:135)
> at
> org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.nodeChildrenChanged(ZKPermissionWatcher.java:121)
> at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> There were 62000 nodes under /acl due to lack of fix from HBASE-12635,
> leading to slowness in table creation because zk notification for region
> offline was blocked by the above.
> The attached patch separates refreshNodes() call into its own thread.
> Thanks to Enis and Devaraj for offline discussion.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)