[
https://issues.apache.org/jira/browse/CASSANDRA-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902082#action_12902082
]
Stu Hood commented on CASSANDRA-1418:
-------------------------------------
* One more component: trunk only implements 'case 2' of Ruhl's. It doesn't
optimize for case 1, where a node is assuming load from a direct neighbor, and
doesn't need to fully dump-data/leave-ring. We'll probably want a separate
subtask to implement this portion.
* Calculation of whether a node should move (based on the ε ratio to it's
neighbor) should probably change to additionally depend on whether the movement
would bring both nodes closer to the ideal load ω.
> Automatic, online load balancing
> --------------------------------
>
> Key: CASSANDRA-1418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1418
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Stu Hood
> Fix For: 0.8
>
>
> h2. Goal
> CASSANDRA-192 began with the intention of implementing full cluster load
> balancing, but ended up being (wisely) limited to a manual load balancing
> operation. This issue is an umbrella ticket for finishing the job of
> implementing automatic, always-on load balancing.
> It is possible to implement very efficient load balancing operations with a
> single process directing the rebalancing of all nodes, but avoiding such a
> central process and allowing individual nodes to make their own movement
> decisions would be ideal.
> h2. Components
> h3. Optimal movements for individual nodes
> h4. Ruhl
> One such approach is the Ruhl algorithm described on 192:
> https://issues.apache.org/jira/browse/CASSANDRA-192#action_12713079 . But as
> described, it performs excessive movement for large hotspots, and can take a
> long time to reach equilibrium. Consider the following ring:
> ||token||load||
> |a|5|
> |c|5|
> |e|5|
> |f|40|
> |k|5|
> Assuming that node 'a' is the first to discover that 'f' is overloaded: it
> will apply Case 2, and assume half of 'f's load by moving to 'i', leaving
> both with 20 units. But this is not a optimal movement, because both 'f' and
> 'a/i' will still be holding data that they will need to give away.
> Additionally, 'a/i' can't begin giving the data away until it has finished
> receiving it.
> If node 'e' is the first to discover that 'f' is overloaded, it will apply
> Case 1, and 'f' will give half of its load to 'e' by moving to 'i'. Again,
> this is a non-optimal movement, because it will result in both 'e' and 'f/i'
> holding data that they need to give away.
> h4. Adding load awareness to Ruhl
> Luckily, there appears to be a simple adjustment to the Ruhl algorithm that
> solves this problem by taking advantage of the fact that Cassandra knows the
> total load of a cluster, and can use it to calculate the average/ideal load
> ω. Once node j has decided it should take load from node i (based on the ε
> value in Ruhl), rather than node j taking 1/2 of the load on node i, it
> should chose a token such that either i or j ends up with a load within ε*ω
> of ω.
> Again considering the ring described above, and assuming ε == 1.0, the total
> load for the 5 nodes is 60, giving a ω of 12. If node 'a' is the first to
> discover 'f', it will choose to move to 'j' (a token that takes 12 or ω load
> units from 'f'), leaving 'f' with a load of 28. When combined with the
> improvement in the next section, this is closer to being an optimal movement,
> because 'a/j' will at worst have ε*ω of load to give away, and 'f' is in a
> position to start more movements.
> h3. Automatic load balancing
> Since the Ruhl algorithm only requires a node to make a decision based on
> itself and one other node, it should be relatively straightforward to add a
> timer on each node that periodically wakes up and executes the modifiied Ruhl
> algorithm if it is not already in the process of moving (based on pending
> ranges).
> Automatic balancing should probably be enabled by default, and should have a
> configurable per-node bandwidth cap.
> h3. Allowing concurrent moves on a node
> Allowing a node to give away multiple ranges at once allows for the type of
> quick balancing that is typically only attributed to vnodes. If a node is a
> hotspot, such as in the example above, the node should be able to quickly
> dump the load in a manner that causes minimal load on the rest of the
> cluster. Rather than transferring to 1 target at 10 MB/s, a hotspot can give
> to 5 targets at 2 MB/s each.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.