[jira] [Comment Edited] (HBASE-24211) Create table is slow in large cluster when AccessController is enabled.

Emil Kleszcz (Jira) Mon, 27 Jan 2025 06:50:39 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-24211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921393#comment-17921393
 ]


Emil Kleszcz edited comment on HBASE-24211 at 1/27/25 2:04 PM:
---------------------------------------------------------------

Regarding this closed issue I noticed the following problem in my cluster when 
new tables are requested to be created I get this:
{code:java}
<2025-01-27T01:00:16.508+0100> <WARN> <zookeeper.ZKUtil>: 
<master:60000-0x3000000286b5bc2, 
quorum=quorumnode1:2181,quorumnode2:2181,quorumnode3:2181, baseZNode=/hbase 
Unable to get data of znode /hbase/acl/namespace:table> 
java.lang.InterruptedException         at 
java.base/java.lang.Object.wait(Native Method)         at 
java.base/java.lang.Object.wait(Object.java:328)         at 
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1604)         at 
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1576)         at 
org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1970)         at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:405)
         at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:403)       
  at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:382)  
       at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.getChildDataAndWatchForNewChildren(ZKUtil.java:503)
         at 
org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.lambda$nodeChildrenChanged$0(ZKPermissionWatcher.java:200)
         at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
         at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)  
       at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
         at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
         at java.base/java.lang.Thread.run(Thread.java:829) {code}
Seems to be related with this change so adding it here. Not sure how bad it is. 
I can see this in more busy clusters. My ZK timeout is 120s so quite high. It's 
not the connection pool of ZK either. It happens when new tables are created.


was (Author: ekleszcz):
Regarding this closed issue I noticed the following problem in my cluster when 
new tables are requested to be created I get this:
{code:java}
<2025-01-27T01:00:16.508+0100> <WARN> <zookeeper.ZKUtil>: 
<master:60000-0x3000000286b5bc2, 
quorum=quorumnode1:2181,quorumnode2:2181,quorumnode3:2181, baseZNode=/hbase 
Unable to get data of znode /hbase/acl/namespace:table> 
java.lang.InterruptedException         at 
java.base/java.lang.Object.wait(Native Method)         at 
java.base/java.lang.Object.wait(Object.java:328)         at 
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1604)         at 
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1576)         at 
org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1970)         at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:405)
         at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:403)       
  at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:382)  
       at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.getChildDataAndWatchForNewChildren(ZKUtil.java:503)
         at 
org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.lambda$nodeChildrenChanged$0(ZKPermissionWatcher.java:200)
         at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
         at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)  
       at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
         at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
         at java.base/java.lang.Thread.run(Thread.java:829) {code}
Seems to be related with this change so adding it here. Not sure how bad it is. 
I can see this in more busy clusters.

> Create table is slow in large cluster when AccessController is enabled.
> -----------------------------------------------------------------------
>
>                 Key: HBASE-24211
>                 URL: https://issues.apache.org/jira/browse/HBASE-24211
>             Project: HBase
>          Issue Type: Bug
>          Components: Performance
>    Affects Versions: 1.3.6, 2.2.4
>            Reporter: Mohammad Arshad
>            Assignee: Mohammad Arshad
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.3.0, 1.7.0
>
>
> *Problem:*
> In HBase 1.3.x  large, performance test, cluster (100 RS, 60k tables, 600k 
> regions) a simple table creation takes around 150 seconds. The time taken 
> varies but still takes lot of time.
> *Analysis:*
> 1. When HBase creates a table , it calls AssignmentManager#assign(final 
> ServerName destination, final List<HRegionInfo> regions)
>  In AssignmentManager#assign，it calls asyncSetOfflineInZooKeeper(state, cb, 
> destination), and waits in below code loop for 2 minutes. 
> {code:java}
>  if (useZKForAssignment) {
>           // Wait until all unassigned nodes have been put up and watchers 
> set.
>           int total = states.size();
>           for (int oldCounter = 0; !server.isStopped();) {
>             int count = counter.get();
>             if (oldCounter != count) {
>               LOG.debug(destination.toString() + " unassigned znodes=" + 
> count +
>                 " of total=" + total + "; oldCounter=" + oldCounter);
>               oldCounter = count;
>             }
>             if (count >= total) break;
>             Thread.sleep(5);
>           }
>         }
> {code}
> 2. asyncSetOfflineInZooKeeper creates a znode under 
> /hbase/region-in-transition/ and calls exist to ensure that znode is created. 
> This is simple operation should not take much time. Then where the time it 
> taken!!!
> 3. ZooKeeper client API process watcher notification and async API response 
> through a queue one by one.
>  If there is a delay in any watcher/response processing by the client, in 
> this case HBase, all other response processing is delayed. Then it appears as 
> if API call has taken more time.
>  Same thing happen in this issue.
> Watcher processing for znode creation under /hbase/acl took most of the time 
> and delayed /hbase/region-in-transition/region znode creation processing. 
> This is why wait in loop was too long. 
> 4. Watcher processing for znode creation under hbase/acl/ calls 
> ZKPermissionWatcher#nodeChildrenChanged, which internally calls 
> ZKUtil.getChildDataAndWatchForNewChildren
>  *which calls ZooKeeper's getData API, in this use case, 60k times which 
> takes most of the time.*
> *Solutions:*
>  Move getChildDataAndWatchForNewChildren call into the async code block in 
> ZKPermissionWatcher#nodeChildrenChanged. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HBASE-24211) Create table is slow in large cluster when AccessController is enabled.

Reply via email to