[jira] Created: (CASSANDRA-1418) Automatic, online load balancing

Stu Hood (JIRA) Sun, 22 Aug 2010 00:05:03 -0700

Automatic, online load balancing
--------------------------------

                 Key: CASSANDRA-1418
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1418
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Stu Hood
             Fix For: 0.8



h2. Goal

CASSANDRA-192 began with the intention of implementing full cluster load 
balancing, but ended up being (wisely) limited to a manual load balancing 
operation. This issue is an umbrella ticket for finishing the job of 
implementing automatic, always-on load balancing.

It is possible to implement very efficient load balancing operations with a 
single process directing the rebalancing of all nodes, but avoiding such a 
central process and allowing individual nodes to make their own movement 
decisions would be ideal.

h2. Components

h3. Optimal movements for individual nodes

h4. Ruhl

One such approach is the Ruhl algorithm described on 192: 
https://issues.apache.org/jira/browse/CASSANDRA-192#action_12713079 . But as 
described, it performs excessive movement for large hotspots, and can take a 
long time to reach equilibrium. Consider the following ring:
||token||load||
|a|5|
|c|5|
|e|5|
|f|40|
|k|5|
Assuming that node 'a' is the first to discover that 'f' is overloaded: it will 
apply Case 2, and assume half of 'f's load by moving to 'i', leaving both with 
20 units. But this is not a optimal movement, because both 'f' and 'a/i' will 
still be holding data that they will need to give away. Additionally, 'a/i' 
can't begin giving the data away until it has finished receiving it.

If node 'e' is the first to discover that 'f' is overloaded, it will apply Case 
1, and 'f' will give half of its load to 'e' by moving to 'i'. Again, this is a 
non-optimal movement, because it will result in both 'e' and 'f/i' holding data 
that they need to give away.

h4. Adding load awareness to Ruhl

Luckily, there appears to be a simple adjustment to the Ruhl algorithm that 
solves this problem by taking advantage of the fact that Cassandra knows the 
total load of a cluster, and can use it to calculate the average/ideal load ω. 
Once node j has decided it should take load from node i (based on the ε value 
in Ruhl), rather than node j taking 1/2 of the load on node i, it should chose 
a token such that either i or j ends up with a load within ε*ω of ω.

Again considering the ring described above, and assuming ε == 1.0, the total 
load for the 5 nodes is 60, giving a ω of 12. If node 'a' is the first to 
discover 'f', it will choose to move to 'j' (a token that takes 12 or ω load 
units from 'f'), leaving 'f' with a load of 28. When combined with the 
improvement in the next section, this is closer to being an optimal movement, 
because 'a/j' will at worst have ε*ω of load to give away, and 'f' is in a 
position to start more movements.

h3. Automatic load balancing

Since the Ruhl algorithm only requires a node to make a decision based on 
itself and one other node, it should be relatively straightforward to add a 
timer on each node that periodically wakes up and executes the modifiied Ruhl 
algorithm if it is not already in the process of moving (based on pending 
ranges).

Automatic balancing should probably be enabled by default, and should have a 
configurable per-node bandwidth cap.

h3. Allowing concurrent moves on a node

Allowing a node to give away multiple ranges at once allows for the type of 
quick balancing that is typically only attributed to vnodes. If a node is a 
hotspot, such as in the example above, the node should be able to quickly dump 
the load in a manner that causes minimal load on the rest of the cluster. 
Rather than transferring to 1 target at 10 MB/s, a hotspot can give to 5 
targets at 2 MB/s each.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (CASSANDRA-1418) Automatic, online load balancing

Reply via email to