[ 
https://issues.apache.org/jira/browse/CASSANDRA-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713060#action_12713060
 ] 

Jonathan Ellis commented on CASSANDRA-192:
------------------------------------------

The Rao paper (or RLS+03, as it is cited in the other) describes three schemes 
to balance the load among machines by giving each several "virtual nodes" (as 
in Dynamo) and moving or splitting nodes from heavily loaded machines to 
lightly loaded.  A scheme like this is relatively simple once you already have 
a virtual node system in place but we do not and we would prefer not to add 
that layer of complexity if we can avoid it.


> Load balancing
> --------------
>
>                 Key: CASSANDRA-192
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-192
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>             Fix For: 0.4
>
>
> We need to be able to spread load evenly across a cluster to mitigate keys 
> not being uniformly distributed as well as heterogeneous nodes in a cluster.  
> The former is particularly likely to be a problem when using the 
> OrderPreservingPartitioner, since the keys are not randomized by a hash 
> function.
> Avinash suggested three papers on load balancing in this thread: 
> http://groups.google.com/group/cassandra-dev/msg/b3d67acf35801c41
> Of these, the useful ones are
>  http://www.iptps.org/papers-2004/karger-load-balance.pdf (Simple Efficient 
> Load Balancing Algorithms for Peer-to-Peer Systems by David R. Karger and 
> Matthias Ruhl)
>  http://iptps03.cs.berkeley.edu/final-papers/load_balancing.ps (Load 
> Balancing in Structured P2P Systems by Ananth Rao et al)
> The third, 
> http://iptps03.cs.berkeley.edu/final-papers/simple_load_balancing.ps (Simple 
> Load Balancing for Distributed Hash Tables by John Byers et al) is not 
> applicable to Cassandra's design.  ("First, we suggest the direct application 
> of the 'power of two choices' paradigm, whereby an item is stored at the less 
> loaded of two (or more) random alternatives. We then consider how associating 
> a small constant number of hash values with a key can naturally be extended 
> to support other load balancing strategies.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to