The starvation problem in current load balance algorithm
--------------------------------------------------------
Key: HBASE-3663
URL: https://issues.apache.org/jira/browse/HBASE-3663
Project: HBase
Issue Type: Bug
Reporter: Liyin Tang
Attachments: result_new_load_balance.txt, result_old_load_balance.txt
This is an interesting starvation case. There are 2 conditions to trigger this
problem.
Condition1: r/s - r/(s+1) << 1
Let r: the number of regions
Let s: the number of servers
Condition2: for each server, the load of each server is less or equal the ceil
of avg load.
Here is the unit test to verify this problem:
For example, there are 16 servers and 62 regions. The avg load is
3.875. And setting the slot to 0 to keep the load of each server either 3 or 4.
When a new server is coming, no server needs to assign regions to this new
server, since no one is larger the ceil of the avg.
(Setting slot to 0 is to easily trigger this situation, otherwise it needs much
larger numbers)
Solutions is pretty straightforward. Just compare the floor of the avg instead
of the ceil. This solution will evenly balance the load from the servers which
is little more loaded than others.
I also attached the comparison result for the case mentioned above between the
old balance algorithm and new balance algorithm. (I set the slot = 0 when
testing)
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira