[ 
https://issues.apache.org/jira/browse/CASSANDRA-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-192:
-------------------------------------

    Description: 
We need to be able to spread load evenly across a cluster to mitigate keys not 
being uniformly distributed as well as heterogeneous nodes in a cluster.  The 
former is particularly likely to be a problem when using the 
OrderPreservingPartitioner, since the keys are not randomized by a hash 
function.

Avinash suggested three papers on load balancing in this thread: 
http://groups.google.com/group/cassandra-dev/msg/b3d67acf35801c41

Of these, the useful ones are
 http://www.iptps.org/papers-2004/karger-load-balance.pdf (Simple Efficient 
Load Balancing Algorithms for Peer-to-Peer Systems by David R. Karger and 
Matthias Ruhl)
 http://iptps03.cs.berkeley.edu/final-papers/load_balancing.ps (Load Balancing 
in Structured P2P Systems by Ananth Rao et al)

The third, 
http://iptps03.cs.berkeley.edu/final-papers/simple_load_balancing.ps (Simple 
Load Balancing for Distributed Hash Tables by John Byers et al) is not 
applicable to Cassandra's design.  ('First, we suggest the direct application 
of the lsquolsquopower of two choicesrsquorsquo paradigm, whereby an item is 
stored at the less loaded of two (or more) random alternatives. We then 
consider how associating a small constant number of hash values with a key can 
naturally be extended to support other load balancing strategies.')

  was:
We need to be able to spread load evenly across a cluster to mitigate keys not 
being uniformly distributed as well as heterogeneous nodes in a cluster.  The 
former is particularly likely to be a problem when using the 
OrderPreservingPartitioner, since the keys are not randomized by a hash 
function.

Avinash suggested three papers on load balancing in this thread: 
http://groups.google.com/group/cassandra-dev/msg/b3d67acf35801c41

Of these, the useful ones are
 http://www.iptps.org/papers-2004/karger-load-balance.pdf (Simple Efficient 
Load Balancing Algorithms for Peer-to-Peer Systems by David R. Karger and 
Matthias Ruhl)
 http://iptps03.cs.berkeley.edu/final-papers/load_balancing.ps (Load Balancing 
in Structured P2P Systems by Ananth Rao et al)

The third, 
http://iptps03.cs.berkeley.edu/final-papers/simple_load_balancing.ps (Simple 
Load Balancing for Distributed Hash Tables by John Byers et al) is not 
applicable to Cassandra's design.  ("First, we suggest the direct application 
of the lsquolsquopower of two choicesrsquorsquo paradigm, whereby an item is 
stored at the less loaded of two (or more) random alternatives. We then 
consider how associating a small constant number of hash values with a key can 
naturally be extended to support other load balancing strategies.")


> Load balancing
> --------------
>
>                 Key: CASSANDRA-192
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-192
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>             Fix For: 0.4
>
>
> We need to be able to spread load evenly across a cluster to mitigate keys 
> not being uniformly distributed as well as heterogeneous nodes in a cluster.  
> The former is particularly likely to be a problem when using the 
> OrderPreservingPartitioner, since the keys are not randomized by a hash 
> function.
> Avinash suggested three papers on load balancing in this thread: 
> http://groups.google.com/group/cassandra-dev/msg/b3d67acf35801c41
> Of these, the useful ones are
>  http://www.iptps.org/papers-2004/karger-load-balance.pdf (Simple Efficient 
> Load Balancing Algorithms for Peer-to-Peer Systems by David R. Karger and 
> Matthias Ruhl)
>  http://iptps03.cs.berkeley.edu/final-papers/load_balancing.ps (Load 
> Balancing in Structured P2P Systems by Ananth Rao et al)
> The third, 
> http://iptps03.cs.berkeley.edu/final-papers/simple_load_balancing.ps (Simple 
> Load Balancing for Distributed Hash Tables by John Byers et al) is not 
> applicable to Cassandra's design.  ('First, we suggest the direct application 
> of the lsquolsquopower of two choicesrsquorsquo paradigm, whereby an item is 
> stored at the less loaded of two (or more) random alternatives. We then 
> consider how associating a small constant number of hash values with a key 
> can naturally be extended to support other load balancing strategies.')

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to