On 17 June 2010 20:08, Lev Stesin <lev.ste...@gmail.com> wrote:

> Hi,
>
> What is the correct procedure to create a well balanced cluster (in
> terms of key distribution). From what I understand whenever I add a
> new node its takes half from its neighbor. How can I make each node to
> contain 1/3 of the keys in a 3 node cluster? Thanks.
>

It depends what replication factor you use.

If you use a replication factor of three, all data goes into all three of
your nodes, hence it will always be well-balanced.

If you use a factor of two, all data goes into two nodes and it may not be
well balanced. It depends on which partitioner you use and how you select
your keys in your application. If you use the random partitioner and have a
large number of keys with data evenly spread, you have nothing to worry
about.

If you use OrderPreservingParitioner then you can easily run into trouble.
Normally the tokens get chosen semi-randomly (initially) and then to
subdivide the existing ones(later) during bootstrap.

If you have keys which steadily increase (e.g. time-based) then you tend to
get unbalanced clusters anyway. You should then probably consider which part
of the key you really need to do range scans on, and consider hashing part
that you don't.

Mark

Reply via email to