[ 
https://issues.apache.org/jira/browse/CASSANDRA-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714582#action_12714582
 ] 

Jun Rao commented on CASSANDRA-192:
-----------------------------------

The basic idea looks right. A few other details.

1. How to handle replicas? At the minimum, it seems that when the data transfer 
from A to B is over, B needs to send delete requests to not only A, but also 
nodes that replicates part of A's data.

2. How to coordinate with hinted handoff data? What happens when the ring 
tokens have changed but there are still some old hinted data delivered to A? Do 
we want A to forward those data to B?
 

> Load balancing
> --------------
>
>                 Key: CASSANDRA-192
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-192
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>             Fix For: 0.4
>
>
> We need to be able to spread load evenly across a cluster to mitigate keys 
> not being uniformly distributed as well as heterogeneous nodes in a cluster.  
> The former is particularly likely to be a problem when using the 
> OrderPreservingPartitioner, since the keys are not randomized by a hash 
> function.
> Avinash suggested three papers on load balancing in this thread: 
> http://groups.google.com/group/cassandra-dev/msg/b3d67acf35801c41
> Of these, the useful ones are
>  http://www.iptps.org/papers-2004/karger-load-balance.pdf (Simple Efficient 
> Load Balancing Algorithms for Peer-to-Peer Systems by David R. Karger and 
> Matthias Ruhl)
>  http://iptps03.cs.berkeley.edu/final-papers/load_balancing.ps (Load 
> Balancing in Structured P2P Systems by Ananth Rao et al)
> The third, 
> http://iptps03.cs.berkeley.edu/final-papers/simple_load_balancing.ps (Simple 
> Load Balancing for Distributed Hash Tables by John Byers et al) is not 
> applicable to Cassandra's design.  ("First, we suggest the direct application 
> of the 'power of two choices' paradigm, whereby an item is stored at the less 
> loaded of two (or more) random alternatives. We then consider how associating 
> a small constant number of hash values with a key can naturally be extended 
> to support other load balancing strategies.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to