[
https://issues.apache.org/jira/browse/CASSANDRA-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713827#action_12713827
]
Jonathan Ellis commented on CASSANDRA-192:
------------------------------------------
Hopefully nobody writes about that because it is the easy part. :)
Here is a scheme I think would work:
if node B is assuming part of the range node A is currently responsible, then
we do it as follows:
Node A notes the range it is transferring and begins anticompaction of its
SSTables. It continues to accept reads and writes for that range but also
forwards writes to B so that anticompaction doesn't need to worry about these
extra writes. When anticompaction is done, it sends the appropriate sections
over to B. When complete, B begins to gossip its adjusted token and A creates
a task that will remove the sections B now has some time in the future (after
we can assume the entire cluster has gotten the new token info -- minutes or
hours, not days).
To avoid garbage on either side in the event of a crash before the process is
complete, we can add a check during compaction that throws away data that is
not part of the current node's responsibility (or replica ranges).
This glosses over a bunch of details (do we anticompact memtables too, or just
flush first and let the sstable code go over it? what if B is already part of
the ring that gets replicas for the range in question?) but I think the basic
idea is sound.
> Load balancing
> --------------
>
> Key: CASSANDRA-192
> URL: https://issues.apache.org/jira/browse/CASSANDRA-192
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Jonathan Ellis
> Fix For: 0.4
>
>
> We need to be able to spread load evenly across a cluster to mitigate keys
> not being uniformly distributed as well as heterogeneous nodes in a cluster.
> The former is particularly likely to be a problem when using the
> OrderPreservingPartitioner, since the keys are not randomized by a hash
> function.
> Avinash suggested three papers on load balancing in this thread:
> http://groups.google.com/group/cassandra-dev/msg/b3d67acf35801c41
> Of these, the useful ones are
> http://www.iptps.org/papers-2004/karger-load-balance.pdf (Simple Efficient
> Load Balancing Algorithms for Peer-to-Peer Systems by David R. Karger and
> Matthias Ruhl)
> http://iptps03.cs.berkeley.edu/final-papers/load_balancing.ps (Load
> Balancing in Structured P2P Systems by Ananth Rao et al)
> The third,
> http://iptps03.cs.berkeley.edu/final-papers/simple_load_balancing.ps (Simple
> Load Balancing for Distributed Hash Tables by John Byers et al) is not
> applicable to Cassandra's design. ("First, we suggest the direct application
> of the 'power of two choices' paradigm, whereby an item is stored at the less
> loaded of two (or more) random alternatives. We then consider how associating
> a small constant number of hash values with a key can naturally be extended
> to support other load balancing strategies.")
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.