On Mon, Aug 20, 2012 at 4:55 PM, Eric Evans <eev...@acunu.com> wrote:
> Shuffling the ranges to create a random distribution from contiguous
> ranges has the potential to move a *lot* of data around (all of it,
> basically).  Doing this in an optimal way would mean never moving a
> range more than once.  Since it is a lot of data, and since presumably
> we're expecting normal operations to continue in the meantime, it
> would seem an optimal shuffle would need to maintain state.  For
> example, one machine could serve as the "shuffle coordinator",
> precalculating and persisting all of the moves, starting new transfers
> as existing ones finish, and tracking the progress, etc.

Fortunately, we have a distributed storage system.... :)

Seriously though, creating a CF mapping vnode from->to tuples,
throwing in the list of changes to make once, and deleting them out as
they complete, would be a pretty simple way to get what we want.

Personally though I don't think likelyhood of "shuffle coordinator"
failure during the operation is *that* high.  I'd be happy to just
assume it works.  But if you want to get fancy, having it perform its
changes in random or round-robin order (instead of node-at-a-time)
then you could "recover" from failure w/o any state at all, for free
-- (restart shuffle after failure, kill it when it's "sufficiently"
shuffled.

Note that you'd have to have cluster-shuffle fail *twice* before even
the least optimal retrying policy gets worse than node-shuffle, since
node-shuffle will transfer exactly 2x as much data as cluster-shuffle,
on average.

My vote would be for a stateless, round-robin cluster-shuffle.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Reply via email to