[ 
https://issues.apache.org/jira/browse/HELIX-543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605704#comment-15605704
 ] 

ASF GitHub Bot commented on HELIX-543:
--------------------------------------

GitHub user lei-xia opened a pull request:

    https://github.com/apache/helix/pull/56

    [HELIX-543] Avoid moving partitions unnecessarily when auto-rebalancing.

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lei-xia/helix helix-0.6.x

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/helix/pull/56.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #56
    
----
commit 45ebe767533a9c014bf37c30e4a6a62652538b5a
Author: Lei Xia <[email protected]>
Date:   2016-10-25T16:01:35Z

    [HELIX-543] Avoid moving partitions unnecessarily when auto-rebalancing.

----


> Single partition unnecessarily moved
> ------------------------------------
>
>                 Key: HELIX-543
>                 URL: https://issues.apache.org/jira/browse/HELIX-543
>             Project: Apache Helix
>          Issue Type: Bug
>          Components: helix-core
>    Affects Versions: 0.7.1, 0.6.4
>            Reporter: Tom Widmer
>            Assignee: kishore gopalakrishna
>            Priority: Minor
>
> (Copied from mailing list)
> I have some resources that I use with the OnlineOffine state but which only 
> have a single partition at the moment (essentially, Helix is just giving me a 
> simple leader election to decide who controls the resource - I don’t care 
> which participant has it, as long as only one does). However, with full auto 
> rebalance, I find that the ‘first’ instance (alphabetically I think) always 
> gets the resource when it’s up. So if I take down the first node so the 
> partition transfers to the 2nd node, then bring back up the 1st node, the 
> resource transfers back unnecessarily.
> Note that this issue also affects multi-partition resources, it’s just a bit 
> less noticeable (it means that with 3 nodes and 4 partitions, say, the 
> partitions are always allocated 2, 1, 1, so if you have node 1 down and hence 
> 0, 2, 2, and then bring up node 1, it unnecessarily moves 2 partitions to 
> make 2, 1, 1 rather than the minimum move to achieve ‘balance’ which would be 
> to move 1 partition from instance 2 or 3 back to instance 1.
> I can see the code in question in 
> AutoRebalanceStrategy.typedComputePartitionAssignment, where the 
> distRemainder is allocated to the first nodes alphabetically, so that the 
> capacity of all nodes is not equal.
> The proposed solution is to sort the nodes by the number of partitions they 
> already have assigned, which should mean that those nodes are assigned the 
> higher capacity and the problem goes away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to