[
https://issues.apache.org/jira/browse/HBASE-24211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921393#comment-17921393
]
Emil Kleszcz commented on HBASE-24211:
--------------------------------------
Regarding this closed issue I noticed the following problem in my cluster when
new tables are requested to be created I get this:
{code:java}
<2025-01-27T01:00:16.508+0100> <WARN> <zookeeper.ZKUtil>:
<master:60000-0x3000000286b5bc2,
quorum=quorumnode1:2181,quorumnode2:2181,quorumnode3:2181, baseZNode=/hbase
Unable to get data of znode /hbase/acl/namespace:table>
java.lang.InterruptedException at
java.base/java.lang.Object.wait(Native Method) at
java.base/java.lang.Object.wait(Object.java:328) at
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1604) at
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1576) at
org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1970) at
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:405)
at
org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:403)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:382)
at
org.apache.hadoop.hbase.zookeeper.ZKUtil.getChildDataAndWatchForNewChildren(ZKUtil.java:503)
at
org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.lambda$nodeChildrenChanged$0(ZKPermissionWatcher.java:200)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829) {code}
Seems to be related with this change so adding it here. Not sure how bad it is.
I can see this in more busy clusters.
> Create table is slow in large cluster when AccessController is enabled.
> -----------------------------------------------------------------------
>
> Key: HBASE-24211
> URL: https://issues.apache.org/jira/browse/HBASE-24211
> Project: HBase
> Issue Type: Bug
> Components: Performance
> Affects Versions: 1.3.6, 2.2.4
> Reporter: Mohammad Arshad
> Assignee: Mohammad Arshad
> Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 1.7.0
>
>
> *Problem:*
> In HBase 1.3.x large, performance test, cluster (100 RS, 60k tables, 600k
> regions) a simple table creation takes around 150 seconds. The time taken
> varies but still takes lot of time.
> *Analysis:*
> 1. When HBase creates a table , it calls AssignmentManager#assign(final
> ServerName destination, final List<HRegionInfo> regions)
> In AssignmentManager#assign,it calls asyncSetOfflineInZooKeeper(state, cb,
> destination), and waits in below code loop for 2 minutes.
> {code:java}
> if (useZKForAssignment) {
> // Wait until all unassigned nodes have been put up and watchers
> set.
> int total = states.size();
> for (int oldCounter = 0; !server.isStopped();) {
> int count = counter.get();
> if (oldCounter != count) {
> LOG.debug(destination.toString() + " unassigned znodes=" +
> count +
> " of total=" + total + "; oldCounter=" + oldCounter);
> oldCounter = count;
> }
> if (count >= total) break;
> Thread.sleep(5);
> }
> }
> {code}
> 2. asyncSetOfflineInZooKeeper creates a znode under
> /hbase/region-in-transition/ and calls exist to ensure that znode is created.
> This is simple operation should not take much time. Then where the time it
> taken!!!
> 3. ZooKeeper client API process watcher notification and async API response
> through a queue one by one.
> If there is a delay in any watcher/response processing by the client, in
> this case HBase, all other response processing is delayed. Then it appears as
> if API call has taken more time.
> Same thing happen in this issue.
> Watcher processing for znode creation under /hbase/acl took most of the time
> and delayed /hbase/region-in-transition/region znode creation processing.
> This is why wait in loop was too long.
> 4. Watcher processing for znode creation under hbase/acl/ calls
> ZKPermissionWatcher#nodeChildrenChanged, which internally calls
> ZKUtil.getChildDataAndWatchForNewChildren
> *which calls ZooKeeper's getData API, in this use case, 60k times which
> takes most of the time.*
> *Solutions:*
> Move getChildDataAndWatchForNewChildren call into the async code block in
> ZKPermissionWatcher#nodeChildrenChanged.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)