Tom Widmer created HELIX-543:
--------------------------------

             Summary: Single partition unnecessarily moved
                 Key: HELIX-543
                 URL: https://issues.apache.org/jira/browse/HELIX-543
             Project: Apache Helix
          Issue Type: Bug
          Components: helix-core
    Affects Versions: 0.6.4, 0.7.1
            Reporter: Tom Widmer
            Priority: Minor


(Copied from mailing list)

I have some resources that I use with the OnlineOffine state but which only 
have a single partition at the moment (essentially, Helix is just giving me a 
simple leader election to decide who controls the resource - I don’t care which 
participant has it, as long as only one does). However, with full auto 
rebalance, I find that the ‘first’ instance (alphabetically I think) always 
gets the resource when it’s up. So if I take down the first node so the 
partition transfers to the 2nd node, then bring back up the 1st node, the 
resource transfers back unnecessarily.

Note that this issue also affects multi-partition resources, it’s just a bit 
less noticeable (it means that with 3 nodes and 4 partitions, say, the 
partitions are always allocated 2, 1, 1, so if you have node 1 down and hence 
0, 2, 2, and then bring up node 1, it unnecessarily moves 2 partitions to make 
2, 1, 1 rather than the minimum move to achieve ‘balance’ which would be to 
move 1 partition from instance 2 or 3 back to instance 1.

I can see the code in question in 
AutoRebalanceStrategy.typedComputePartitionAssignment, where the distRemainder 
is allocated to the first nodes alphabetically, so that the capacity of all 
nodes is not equal.

The proposed solution is to sort the nodes by the number of partitions they 
already have assigned, which should mean that those nodes are assigned the 
higher capacity and the problem goes away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to