Mohammad Arshad created HBASE-24211:
---------------------------------------
Summary: Create table is slow in large cluster when
AccessController is enabled.
Key: HBASE-24211
URL: https://issues.apache.org/jira/browse/HBASE-24211
Project: HBase
Issue Type: Bug
Affects Versions: 2.2.4, 1.3.6, master
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad
*Problem:*
In HBase 1.3.x large, performance test, cluster (100 RS, 60k tables, 600k
regions) a simple table creation takes around 150 seconds. The time taken
varies but still takes lot of time.
*Analysis:*
1. When HBase creates a table , it calls AssignmentManager#assign(final
ServerName destination, final List<HRegionInfo> regions)
In AssignmentManager#assign,it calls asyncSetOfflineInZooKeeper(state, cb,
destination), and waits in below code loop for 2 minutes.
{code:java}
if (useZKForAssignment) {
// Wait until all unassigned nodes have been put up and watchers set.
int total = states.size();
for (int oldCounter = 0; !server.isStopped();) {
int count = counter.get();
if (oldCounter != count) {
LOG.debug(destination.toString() + " unassigned znodes=" + count +
" of total=" + total + "; oldCounter=" + oldCounter);
oldCounter = count;
}
if (count >= total) break;
Thread.sleep(5);
}
}
{code}
2. asyncSetOfflineInZooKeeper creates a znode under
/hbase/region-in-transition/ and calls exist to ensure that znode is created.
This is simple operation should not take much time. Then where the time it
taken!!!
3. ZooKeeper client API process watcher notification and async API response
through a queue one by one.
If there is a delay in any watcher/response processing by the client, in this
case HBase, all other response processing is delayed. Then it appears as if API
call has taken more time.
Same thing happen in this issue.
Watcher processing for znode creation under /hbase/acl took most of the time
and delayed /hbase/region-in-transition/region znode creation processing. This
is why wait in loop was too long.
4. Watcher processing for znode creation under hbase/acl/ calls
ZKPermissionWatcher#nodeChildrenChanged, which internally calls
ZKUtil.getChildDataAndWatchForNewChildren
*which calls ZooKeeper's getData API, in this use case, 60k times which takes
most of the time.*
*Solutions:*
Move getChildDataAndWatchForNewChildren call into the async code block in
ZKPermissionWatcher#nodeChildrenChanged.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)