[ 
https://issues.apache.org/jira/browse/HBASE-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629060#action_12629060
 ] 

Andrew Purtell commented on HBASE-862:
--------------------------------------

For cluster startup, how about:

* Master waits a short interval for regionserver start messages to arrive.

* After the initial waiting period has elapsed, begin assigning regions based 
on the current count of announced regionservers.

* If additional servers report in, try to get their region count up to the 
current average before assigning additional regions to earlier reporters. 

* Once all regions have been assigned, wait a dampening period before starting 
the balancer. 

Combine the above with changes to start-hbase.sh and hbase-daemons.sh to start 
the regionservers ahead of the master and start the regionservers in parallel 
rather than serially, and I think the startup behavior will improve. 

Balancing regions in steady state, especially if there are late or new arrivals 
to the cluster, is a different proposition I think. Billy had some ideas about 
that in an earlier comment. Also some special case handling of META is in 
order.  

Concerning balancing in steady state, 'load balancing' attempts to ensure that 
the workload on each host is within a small degree of the workload present on 
every other host in the system. 'Load leveling' on the other hand is a more 
relaxed approach that only seeks to avoid congestion on any one host. Balancing 
is proactive. Leveling is reactive. I think both achieve the same end over time 
(with balancing "trying harder") and since leveling is simpler and requires 
little work or coordination on the part of the master, I thought I'd try that 
first.


> region balancing is clumsy
> --------------------------
>
>                 Key: HBASE-862
>                 URL: https://issues.apache.org/jira/browse/HBASE-862
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> Daniel Leffel has an install of 500 regions on 4 nodes.  He's running 0.2.0.
> On restart, load balancing is running while the 600 regions are being 
> initially opened.  Makes for churn.  Load balancing should wait before it 
> cuts in.
> Have also seen on occasion that it will not find equilibrium after a restart.
> Adding a node is catastrophic.  >20% of the regions were closed and were 
> taking the longest time to show up on the new server.  I would think that the 
> region balancing would work in more sophisticated and gradual manner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to