[
https://issues.apache.org/jira/browse/HELIX-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720934#comment-13720934
]
Kanak Biscuitwala commented on HELIX-141:
-----------------------------------------
The decision was to implement this in stages, first as an algorithm that will
do automatic rebalancing that tries to minimize movement, and then to extend it
to a more generic knapsack solver that can handle the case where replicas may
not put equal load on each machine.
The design of the first part is described in detail here:
https://cwiki.apache.org/confluence/display/HELIX/Automatic+Rebalancing+Strategy
Basically, there are a number of constraints: replicas of the same partition
must each live on different nodes, currently assigned replicas should not move
much if possible when nodes are added or removed, any provided maximum capacity
should not be exceeded, and each node should have roughly an equal number of
assigned replicas. To do this, we implemented an algorithm that first computes
a "preferred assignment" of replicas to nodes where we use a formula similar to
the (p + r)%N above, where N is the expected number of nodes that will be up
when the system is up and running.
Then, we compute the following sets: replicas currently assigned to their
preferred locations, replicas assigned to non-preferred locations, and
unassigned replicas. Using these sets, we do the following:
1. If a replica is assigned to an oversubscribed node, try to move it to its
preferred node if there is room
2. Randomly assign the unassigned replicas
3. Randomly assign the remaining non-preferred assignments that currently live
on over-subscribed nodes
There may be unassigned replicas remaining simply because the number of current
nodes do not have enough capacity to support them. We think this is acceptable
because as more nodes are added, the algorithm will naturally pick them up and
assign them.
A fix and an end-to-end test are currently out for review.
> Autorebalance does not work reliably and fails when replica>1
> -------------------------------------------------------------
>
> Key: HELIX-141
> URL: https://issues.apache.org/jira/browse/HELIX-141
> Project: Apache Helix
> Issue Type: Bug
> Reporter: kishore gopalakrishna
> Assignee: Kanak Biscuitwala
>
> Autorebalance has a bug in calculating the assignment of partitions to nodes.
> It does not consider the fact that a partition can have replicas. This was
> not tested because we generally use 1 replica ( consumer grouping).
> As we are doing this, we might want to fix/add the following features
> #1. Allow user to set a minimum number of nodes to be up before computing the
> assignment. This minimizes the number of re-assignments when nodes start up.
> #2. Have the concept of preferred node for a partition,replica. Which means
> when ever a partition,replica is orphaned try to first place it on its
> preferred location. Preferred location can be something like (p+r)%N. This
> will require us to use the total number of nodes to compute the preferred
> location.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira