[ 
https://issues.apache.org/jira/browse/HELIX-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720934#comment-13720934
 ] 

Kanak Biscuitwala commented on HELIX-141:
-----------------------------------------

The decision was to implement this in stages, first as an algorithm that will 
do automatic rebalancing that tries to minimize movement, and then to extend it 
to a more generic knapsack solver that can handle the case where replicas may 
not put equal load on each machine.

The design of the first part is described in detail here: 
https://cwiki.apache.org/confluence/display/HELIX/Automatic+Rebalancing+Strategy

Basically, there are a number of constraints: replicas of the same partition 
must each live on different nodes, currently assigned replicas should not move 
much if possible when nodes are added or removed, any provided maximum capacity 
should not be exceeded, and each node should have roughly an equal number of 
assigned replicas. To do this, we implemented an algorithm that first computes 
a "preferred assignment" of replicas to nodes where we use a formula similar to 
the (p + r)%N above, where N is the expected number of nodes that will be up 
when the system is up and running.

Then, we compute the following sets: replicas currently assigned to their 
preferred locations, replicas assigned to non-preferred locations, and 
unassigned replicas. Using these sets, we do the following:

1. If a replica is assigned to an oversubscribed node, try to move it to its 
preferred node if there is room
2. Randomly assign the unassigned replicas
3. Randomly assign the remaining non-preferred assignments that currently live 
on over-subscribed nodes

There may be unassigned replicas remaining simply because the number of current 
nodes do not have enough capacity to support them. We think this is acceptable 
because as more nodes are added, the algorithm will naturally pick them up and 
assign them.

A fix and an end-to-end test are currently out for review.
                
> Autorebalance does not work reliably and fails when replica>1
> -------------------------------------------------------------
>
>                 Key: HELIX-141
>                 URL: https://issues.apache.org/jira/browse/HELIX-141
>             Project: Apache Helix
>          Issue Type: Bug
>            Reporter: kishore gopalakrishna
>            Assignee: Kanak Biscuitwala
>
> Autorebalance has a bug in calculating the assignment of partitions to nodes. 
> It does not consider the fact that  a partition can have replicas. This was 
> not tested because we generally use 1 replica ( consumer grouping). 
> As we are doing this, we might want to fix/add the following features
> #1. Allow user to set a minimum number of nodes to be up before computing the 
> assignment. This minimizes the number of re-assignments when nodes start up.
> #2. Have the concept of preferred node for a partition,replica. Which means 
> when ever a partition,replica is orphaned try to first place it on its 
> preferred location. Preferred location can be something like (p+r)%N. This 
> will require us to use the total number of nodes to compute the preferred 
> location. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to