On 17 June 2010 20:08, Lev Stesin <lev.ste...@gmail.com> wrote: > Hi, > > What is the correct procedure to create a well balanced cluster (in > terms of key distribution). From what I understand whenever I add a > new node its takes half from its neighbor. How can I make each node to > contain 1/3 of the keys in a 3 node cluster? Thanks. >
It depends what replication factor you use. If you use a replication factor of three, all data goes into all three of your nodes, hence it will always be well-balanced. If you use a factor of two, all data goes into two nodes and it may not be well balanced. It depends on which partitioner you use and how you select your keys in your application. If you use the random partitioner and have a large number of keys with data evenly spread, you have nothing to worry about. If you use OrderPreservingParitioner then you can easily run into trouble. Normally the tokens get chosen semi-randomly (initially) and then to subdivide the existing ones(later) during bootstrap. If you have keys which steadily increase (e.g. time-based) then you tend to get unbalanced clusters anyway. You should then probably consider which part of the key you really need to do range scans on, and consider hashing part that you don't. Mark