On Tue, Aug 21, 2012 at 2:54 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
> On Mon, Aug 20, 2012 at 4:55 PM, Eric Evans <eev...@acunu.com> wrote:
>> Shuffling the ranges to create a random distribution from contiguous
>> ranges has the potential to move a *lot* of data around (all of it,
>> basically).  Doing this in an optimal way would mean never moving a
>> range more than once.  Since it is a lot of data, and since presumably
>> we're expecting normal operations to continue in the meantime, it
>> would seem an optimal shuffle would need to maintain state.  For
>> example, one machine could serve as the "shuffle coordinator",
>> precalculating and persisting all of the moves, starting new transfers
>> as existing ones finish, and tracking the progress, etc.
>
> Fortunately, we have a distributed storage system.... :)
>
> Seriously though, creating a CF mapping vnode from->to tuples,
> throwing in the list of changes to make once, and deleting them out as
> they complete, would be a pretty simple way to get what we want.

Yeah, that's exactly what I had in mind to do.

Actually, now that I think about it, I'd probably drop the entire
notion of a "coordinator", and write the respective entiries into a
column family in the system keyspaces.  Each system could then work
through their respective queue of relocations at their own pace.

What would you think of this approach?

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu

Reply via email to