[
https://issues.apache.org/jira/browse/HBASE-920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639714#action_12639714
]
stack commented on HBASE-920:
-----------------------------
Made 229 regions. In a cluster with three servers, I restarted a couple of
times. Below are the distributions over a couple of restarts:
Address Start Code Load
13.powerset.com:60020 1224043343420 requests: 251 regions: 77
14.powerset.com:60020 1224043340404 requests: 0 regions: 76
15.u.powerset.com:60020 1224043340366 requests: 1 regions: 78
Total: servers: 3 requests: 252 regions: 229
Balancing ran once only it seems:
2008-10-15 04:02:24,580 DEBUG org.apache.hadoop.hbase.master.ServerManager:
Total Load: 1, Num Servers: 3, Avg Load: 1.0
2008-10-15 04:02:39,583 DEBUG org.apache.hadoop.hbase.master.ServerManager:
Total Load: 62, Num Servers: 3, Avg Load: 21.0
2008-10-15 04:03:03,936 DEBUG org.apache.hadoop.hbase.master.ServerManager:
Total Load: 229, Num Servers: 3, Avg Load: 77.0
2008-10-15 04:03:04,058 DEBUG org.apache.hadoop.hbase.master.RegionManager:
Server XX.XX.XX:60020 is overloaded. Server load: 87 avg: 77.0, slop: 0.1
2008-10-15 04:03:18,966 DEBUG org.apache.hadoop.hbase.master.ServerManager:
Total Load: 229, Num Servers: 3, Avg Load: 77.0
2008-10-15 04:03:33,991 DEBUG org.apache.hadoop.hbase.master.ServerManager:
Total Load: 229, Num Servers: 3, Avg Load: 77.0
Ran it again a few times. Worst skew was two off the average.
Ran with four servers. Some skew.
Address Start Code Load
12.powerset.com:60020 1224044265656 requests: 0 regions: 63
13.powerset.com:60020 1224044265216 requests: 0 regions: 59
14.powerset.com:60020 1224044265160 requests: 0 regions: 53
15.powerset.com:60020 1224044265235 requests: 0 regions: 56
Average 58.
> Make region balancing sloppier
> ------------------------------
>
> Key: HBASE-920
> URL: https://issues.apache.org/jira/browse/HBASE-920
> Project: Hadoop HBase
> Issue Type: Improvement
> Reporter: stack
> Assignee: stack
> Fix For: 0.18.1
>
> Attachments: hbase-920.patch
>
>
> The region load balancer is exacting. Here's the logic:
> {code}
> if (avgLoad > 2.0 && thisServersLoad.getNumberOfRegions() > avgLoad) {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Server " + serverName + " is overloaded. Server load:
> " +
> thisServersLoad.getNumberOfRegions() + " avg: " + avgLoad);
> }
> ...
> {code}
> On a cluster of thousands of regions, especially around startup or if there's
> been a crash, the above makes for a bunch of churn as load balancer closes
> and opens nodes to achieve an exact balance (all nodes must be <= to average).
> I'd suggest that nodes should be left alone if they are within some
> percentage of the average -- say 10% (should be configurable).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.