[ 
https://issues.apache.org/jira/browse/HELIX-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630532#comment-15630532
 ] 

ASF GitHub Bot commented on HELIX-400:
--------------------------------------

GitHub user mkscrg opened a pull request:

    https://github.com/apache/helix/pull/58

    helix-core: AutoRebalancer should include only numbered states in 
`currentMapping`

    AutoRebalancer constructs a `currentMapping` (`Map<PartitionId, 
Map<ParticipantId, State>>`) which it passes to 
`AutoRebalanceStrategy#computePartitionAssignment()`. `ARS` uses the mapping to 
sort the live nodes by # of partitions they hold.
    
    In `helix-0.6.x`, `currentMapping` includes _all states_, including "null" 
states like `DROPPED` or `OFFLINE`. This breaks `ARS`'s node sorting, causing 
it to incorrectly move partitions when nodes restart after disconnecting.
    
    `helix-0.7.x` does not have this issue. It was introduced between 
`0.6.2-incubating` and `0.6.3`:
    
    > [[HELIX-400] Remove all references to the old full auto rebalancing 
code](https://github.com/apache/helix/commit/8d99778a30d10f529ee0757286efa84ea581b5bf)
    
    See also
    - the recent port of [HELIX-543] (#56) to `helix-0.6.x`, which intended to 
avoid unnecessary partition movement. That port was ineffective due to this 
issue.
    - [mailing 
list](http://mail-archives.apache.org/mod_mbox/helix-user/201610.mbox/%3CCAC56g41ejjcSi1P-Ohp3esyGqemBgFoji2Gy8tZQnJMo156OpA%40mail.gmail.com%3E)
 thread for more background
    
    
    ### Example
    
    Consider this scenario:
    
    ```
    OnlineOffline state model
    2 nodes "NODE_0" and "NODE_1"
    1 resource "P" w/ 1 replica, 1 partition
    ----------
    rebalance
    > currentMapping: `{P: {NODE_0: ONLINE}}`
    stop NODE_0
    > currentMapping: `{P: {NODE_1: ONLINE}}`
    start NODE_0
    > currentMapping: `{P: {NODE_0: OFFLINE, NODE_1: ONLINE}}`
    ```
    
    `ARS#computePartitionAssignment()` sorts the live nodes by the # of 
partitions they hold, based on `currentMapping`, then reassigns partitions 
based on that sort. (The sort breaks ties by comparing the node names.) So 
after restarting `NODE_0`, the sort is `[NODE_0, NODE_1]`, and the `ONLINE` 
partition is incorrectly moved back to `NODE_0`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mkscrg/helix rebalance-numbered-states-only

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/helix/pull/58.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #58
    
----
commit 131e67bd7d98ae18eb4bbe0356cdd3a088f12c18
Author: Mike Craig <[email protected]>
Date:   2016-11-02T20:22:11Z

    helix-core: AutoRebalancer should include only numbered states in 
`currentMapping`

----


> 0.6.x still calls the old rebalancing algorithm for no reason
> -------------------------------------------------------------
>
>                 Key: HELIX-400
>                 URL: https://issues.apache.org/jira/browse/HELIX-400
>             Project: Apache Helix
>          Issue Type: Sub-task
>            Reporter: Kanak Biscuitwala
>            Assignee: Kanak Biscuitwala
>             Fix For: 0.6.3
>
>
> After calling the new algorithm, the old algorithm is called. Typically this 
> is a no-op, except in the case of disabled partitions, where it might do the 
> wrong thing. In any case, this shouldn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to