[
https://issues.apache.org/jira/browse/HELIX-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14210192#comment-14210192
]
Zhen Zhang commented on HELIX-547:
----------------------------------
it's related to HELIX-540 and HELIX-541
> AutoRebalancer may not converge in some rare situation
> ------------------------------------------------------
>
> Key: HELIX-547
> URL: https://issues.apache.org/jira/browse/HELIX-547
> Project: Apache Helix
> Issue Type: Bug
> Reporter: Zhen Zhang
>
> We discovered that AutoRebalancer may not converge to a stable mapping in
> some rare situation. Assume we have a DB with 1024 partitions; using
> LeaderStandby state model; replica is 1; 6 nodes which are all alive. The
> current mapping is:
> {noformat}
> ...
> MyDB_873={localhost_5=LEADER}
> ...
> {noformat}
> Given:
> {noformat}
> allNodes=allLiveNodes={localhost_0, ..., localhost_5}
> stateCountMap: {LEADER=1, STANDBY=0}
> capacity: 2147483647
> {noformat}
> AutoRebalanceStrategy#computePartitionAssignment will output new mapping:
> {noformat}
> ...
> MyDB_873={localhost_1=LEADER}
> ...
> {noformat}
> Then Helix controller will send LEADER->STANDBY to localhost_5, and
> OFFLINE->STANDBY to localhost_1, so next time when auto rebalancer is
> triggered, the current mapping becomes:
> {noformat}
> ...
> MyDB_873={localhost_5=STANDBY, localhost_1=STANDBY}
> ...
> {noformat}
> In this case, AutoRebalanceStrategy#computePartitionAssignment will output
> new mapping:
> {noformat}
> ...
> MyDB_873={localhost_5=LEADER}
> ...
> {noformat}
> Thus AutoRebalanceStrategy#computePartitionAssignment keeps assign
> localhost_1 and localhost_5 to MyDB_873 alternatively without converging to a
> stable mapping.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)