Zhen Zhang created HELIX-547:
--------------------------------

             Summary: AutoRebalancer may not converge in some rare situation
                 Key: HELIX-547
                 URL: https://issues.apache.org/jira/browse/HELIX-547
             Project: Apache Helix
          Issue Type: Bug
            Reporter: Zhen Zhang


We discovered that AutoRebalancer may not converge to a stable mapping in some 
rare situation. Assume we have a DB with 1024 partitions; using LeaderStandby 
state model; replica is 1; 6 nodes which are all alive. The current mapping is:
{noformat}
...
MyDB_873={localhost_5=LEADER}
...
{noformat}

Given:
{noformat}
allNodes=allLiveNodes={localhost_0, ..., localhost_5}
stateCountMap: {LEADER=1, STANDBY=0}
capacity: 2147483647
{noformat}

AutoRebalanceStrategy#computePartitionAssignment will output new mapping:
{noformat}
...
MyDB_873={localhost_1=LEADER}
...
{noformat}

Then Helix controller will send LEADER->STANDBY to localhost_5, and 
OFFLINE->STANDBY to localhost_1, so next time when auto rebalancer is 
triggered, the current mapping becomes:
{noformat}
...
MyDB_873={localhost_5=STANDBY, localhost_1=STANDBY}
...
{noformat}

In this case, AutoRebalanceStrategy#computePartitionAssignment will output new 
mapping:
{noformat}
...
MyDB_873={localhost_5=LEADER}
...
{noformat}

Thus AutoRebalanceStrategy#computePartitionAssignment keeps assign localhost_1 
and localhost_5 to MyDB_873 alternatively without converging to a stable 
mapping.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to