Zhen Zhang created HELIX-547:
--------------------------------
Summary: AutoRebalancer may not converge in some rare situation
Key: HELIX-547
URL: https://issues.apache.org/jira/browse/HELIX-547
Project: Apache Helix
Issue Type: Bug
Reporter: Zhen Zhang
We discovered that AutoRebalancer may not converge to a stable mapping in some
rare situation. Assume we have a DB with 1024 partitions; using LeaderStandby
state model; replica is 1; 6 nodes which are all alive. The current mapping is:
{noformat}
...
MyDB_873={localhost_5=LEADER}
...
{noformat}
Given:
{noformat}
allNodes=allLiveNodes={localhost_0, ..., localhost_5}
stateCountMap: {LEADER=1, STANDBY=0}
capacity: 2147483647
{noformat}
AutoRebalanceStrategy#computePartitionAssignment will output new mapping:
{noformat}
...
MyDB_873={localhost_1=LEADER}
...
{noformat}
Then Helix controller will send LEADER->STANDBY to localhost_5, and
OFFLINE->STANDBY to localhost_1, so next time when auto rebalancer is
triggered, the current mapping becomes:
{noformat}
...
MyDB_873={localhost_5=STANDBY, localhost_1=STANDBY}
...
{noformat}
In this case, AutoRebalanceStrategy#computePartitionAssignment will output new
mapping:
{noformat}
...
MyDB_873={localhost_5=LEADER}
...
{noformat}
Thus AutoRebalanceStrategy#computePartitionAssignment keeps assign localhost_1
and localhost_5 to MyDB_873 alternatively without converging to a stable
mapping.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)