[jira] [Commented] (HBASE-24211) Create table is slow in large cluster when AccessController is enabled.

Emil Kleszcz (Jira) Mon, 27 Jan 2025 06:05:45 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-24211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921393#comment-17921393
 ]


Emil Kleszcz commented on HBASE-24211:
--------------------------------------

Regarding this closed issue I noticed the following problem in my cluster when 
new tables are requested to be created I get this:
{code:java}
<2025-01-27T01:00:16.508+0100> <WARN> <zookeeper.ZKUtil>: 
<master:60000-0x3000000286b5bc2, 
quorum=quorumnode1:2181,quorumnode2:2181,quorumnode3:2181, baseZNode=/hbase 
Unable to get data of znode /hbase/acl/namespace:table> 
java.lang.InterruptedException         at 
java.base/java.lang.Object.wait(Native Method)         at 
java.base/java.lang.Object.wait(Object.java:328)         at 
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1604)         at 
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1576)         at 
org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1970)         at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:405)
         at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:403)       
  at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:382)  
       at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.getChildDataAndWatchForNewChildren(ZKUtil.java:503)
         at 
org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.lambda$nodeChildrenChanged$0(ZKPermissionWatcher.java:200)
         at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
         at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)  
       at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
         at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
         at java.base/java.lang.Thread.run(Thread.java:829) {code}
Seems to be related with this change so adding it here. Not sure how bad it is. 
I can see this in more busy clusters.

> Create table is slow in large cluster when AccessController is enabled.
> -----------------------------------------------------------------------
>
>                 Key: HBASE-24211
>                 URL: https://issues.apache.org/jira/browse/HBASE-24211
>             Project: HBase
>          Issue Type: Bug
>          Components: Performance
>    Affects Versions: 1.3.6, 2.2.4
>            Reporter: Mohammad Arshad
>            Assignee: Mohammad Arshad
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.3.0, 1.7.0
>
>
> *Problem:*
> In HBase 1.3.x  large, performance test, cluster (100 RS, 60k tables, 600k 
> regions) a simple table creation takes around 150 seconds. The time taken 
> varies but still takes lot of time.
> *Analysis:*
> 1. When HBase creates a table , it calls AssignmentManager#assign(final 
> ServerName destination, final List<HRegionInfo> regions)
>  In AssignmentManager#assign，it calls asyncSetOfflineInZooKeeper(state, cb, 
> destination), and waits in below code loop for 2 minutes. 
> {code:java}
>  if (useZKForAssignment) {
>           // Wait until all unassigned nodes have been put up and watchers 
> set.
>           int total = states.size();
>           for (int oldCounter = 0; !server.isStopped();) {
>             int count = counter.get();
>             if (oldCounter != count) {
>               LOG.debug(destination.toString() + " unassigned znodes=" + 
> count +
>                 " of total=" + total + "; oldCounter=" + oldCounter);
>               oldCounter = count;
>             }
>             if (count >= total) break;
>             Thread.sleep(5);
>           }
>         }
> {code}
> 2. asyncSetOfflineInZooKeeper creates a znode under 
> /hbase/region-in-transition/ and calls exist to ensure that znode is created. 
> This is simple operation should not take much time. Then where the time it 
> taken!!!
> 3. ZooKeeper client API process watcher notification and async API response 
> through a queue one by one.
>  If there is a delay in any watcher/response processing by the client, in 
> this case HBase, all other response processing is delayed. Then it appears as 
> if API call has taken more time.
>  Same thing happen in this issue.
> Watcher processing for znode creation under /hbase/acl took most of the time 
> and delayed /hbase/region-in-transition/region znode creation processing. 
> This is why wait in loop was too long. 
> 4. Watcher processing for znode creation under hbase/acl/ calls 
> ZKPermissionWatcher#nodeChildrenChanged, which internally calls 
> ZKUtil.getChildDataAndWatchForNewChildren
>  *which calls ZooKeeper's getData API, in this use case, 60k times which 
> takes most of the time.*
> *Solutions:*
>  Move getChildDataAndWatchForNewChildren call into the async code block in 
> ZKPermissionWatcher#nodeChildrenChanged. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-24211) Create table is slow in large cluster when AccessController is enabled.

Reply via email to