[ 
https://issues.apache.org/jira/browse/CASSANDRA-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713827#action_12713827
 ] 

Jonathan Ellis commented on CASSANDRA-192:
------------------------------------------

Hopefully nobody writes about that because it is the easy part. :)

Here is a scheme I think would work:

if node B is assuming part of the range node A is currently responsible, then 
we do it as follows:

Node A notes the range it is transferring and begins anticompaction of its 
SSTables.  It continues to accept reads and writes for that range but also 
forwards writes to B so that anticompaction doesn't need to worry about these 
extra writes.  When anticompaction is done, it sends the appropriate sections 
over to B.  When complete, B begins to gossip its adjusted token and A creates 
a task that will remove the sections B now has some time in the future (after 
we can assume the entire cluster has gotten the new token info -- minutes or 
hours, not days).

To avoid garbage on either side in the event of a crash before the process is 
complete, we can add a check during compaction that throws away data that is 
not part of the current node's responsibility (or replica ranges).

This glosses over a bunch of details (do we anticompact memtables too, or just 
flush first and let the sstable code go over it?  what if B is already part of 
the ring that gets replicas for the range in question?) but I think the basic 
idea is sound.

> Load balancing
> --------------
>
>                 Key: CASSANDRA-192
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-192
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>             Fix For: 0.4
>
>
> We need to be able to spread load evenly across a cluster to mitigate keys 
> not being uniformly distributed as well as heterogeneous nodes in a cluster.  
> The former is particularly likely to be a problem when using the 
> OrderPreservingPartitioner, since the keys are not randomized by a hash 
> function.
> Avinash suggested three papers on load balancing in this thread: 
> http://groups.google.com/group/cassandra-dev/msg/b3d67acf35801c41
> Of these, the useful ones are
>  http://www.iptps.org/papers-2004/karger-load-balance.pdf (Simple Efficient 
> Load Balancing Algorithms for Peer-to-Peer Systems by David R. Karger and 
> Matthias Ruhl)
>  http://iptps03.cs.berkeley.edu/final-papers/load_balancing.ps (Load 
> Balancing in Structured P2P Systems by Ananth Rao et al)
> The third, 
> http://iptps03.cs.berkeley.edu/final-papers/simple_load_balancing.ps (Simple 
> Load Balancing for Distributed Hash Tables by John Byers et al) is not 
> applicable to Cassandra's design.  ("First, we suggest the direct application 
> of the 'power of two choices' paradigm, whereby an item is stored at the less 
> loaded of two (or more) random alternatives. We then consider how associating 
> a small constant number of hash values with a key can naturally be extended 
> to support other load balancing strategies.")

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to