[ 
https://issues.apache.org/jira/browse/HBASE-24211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096426#comment-17096426
 ] 

Viraj Jasani commented on HBASE-24211:
--------------------------------------

[~arshad.mohammad] Can you please take a look at 
TestAccessController.testAccessControllerUserPermsRegexHandling failures with 
this patch. We need new patch.

[~zhangduo] Issue number that I see in the commit is HBASE-24211, seems 
correct: 
[https://github.com/apache/hbase/commit/e6cc5eb2f0623f02eaa3542308fc3d82fd3abd9f].
 But the reverted commit has incorrect number HBASE-24212: 
[https://github.com/apache/hbase/commit/39a1bc53f87ede645254ddd1310ded82dd33071c].
 I am bit confused here. Could you please confirm?

> Create table is slow in large cluster when AccessController is enabled.
> -----------------------------------------------------------------------
>
>                 Key: HBASE-24211
>                 URL: https://issues.apache.org/jira/browse/HBASE-24211
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.3.6, master, 2.2.4
>            Reporter: Mohammad Arshad
>            Assignee: Mohammad Arshad
>            Priority: Major
>             Fix For: 3.0.0, 2.3.0, 1.7.0
>
>
> *Problem:*
> In HBase 1.3.x  large, performance test, cluster (100 RS, 60k tables, 600k 
> regions) a simple table creation takes around 150 seconds. The time taken 
> varies but still takes lot of time.
> *Analysis:*
> 1. When HBase creates a table , it calls AssignmentManager#assign(final 
> ServerName destination, final List<HRegionInfo> regions)
>  In AssignmentManager#assign,it calls asyncSetOfflineInZooKeeper(state, cb, 
> destination), and waits in below code loop for 2 minutes. 
> {code:java}
>  if (useZKForAssignment) {
>           // Wait until all unassigned nodes have been put up and watchers 
> set.
>           int total = states.size();
>           for (int oldCounter = 0; !server.isStopped();) {
>             int count = counter.get();
>             if (oldCounter != count) {
>               LOG.debug(destination.toString() + " unassigned znodes=" + 
> count +
>                 " of total=" + total + "; oldCounter=" + oldCounter);
>               oldCounter = count;
>             }
>             if (count >= total) break;
>             Thread.sleep(5);
>           }
>         }
> {code}
> 2. asyncSetOfflineInZooKeeper creates a znode under 
> /hbase/region-in-transition/ and calls exist to ensure that znode is created. 
> This is simple operation should not take much time. Then where the time it 
> taken!!!
> 3. ZooKeeper client API process watcher notification and async API response 
> through a queue one by one.
>  If there is a delay in any watcher/response processing by the client, in 
> this case HBase, all other response processing is delayed. Then it appears as 
> if API call has taken more time.
>  Same thing happen in this issue.
> Watcher processing for znode creation under /hbase/acl took most of the time 
> and delayed /hbase/region-in-transition/region znode creation processing. 
> This is why wait in loop was too long. 
> 4. Watcher processing for znode creation under hbase/acl/ calls 
> ZKPermissionWatcher#nodeChildrenChanged, which internally calls 
> ZKUtil.getChildDataAndWatchForNewChildren
>  *which calls ZooKeeper's getData API, in this use case, 60k times which 
> takes most of the time.*
> *Solutions:*
>  Move getChildDataAndWatchForNewChildren call into the async code block in 
> ZKPermissionWatcher#nodeChildrenChanged. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to