Re: Cassandra data distribution and configuration settings

Mark Robson Tue, 17 Nov 2009 23:53:18 -0800

2009/11/17 Richard Grossman <[email protected]>

> Ho do I evaluate the value I need to put here ??
> The second point is that I've many column family each with a different key
> then how do I know what is the token to distribute the data ??
>


It's not automatic at the moment.

If you leave it to make its own token, it'll make a token randomly in the
character range it uses (I think 0-9a-zA-Z ). This is not ideal if you're
using (say lowercase) hex keys.

The only solution for now is to specify your own tokens.

For 0.5 it seems likely that adding new nodes will automatically load
balance, and auto-bootstrap, so the best strategy would be to start with
just one or two nodes, then load a small sample of data before bootstrapping
the remaining ones.

If you know your keys will (start with or) be a hex number, then just set
the tokens to 0,4,8,c (if you have 4 nodes, for example). Or anything
really, as long as they're evenly distributed.

Choosing keys correctly is important for the ordered partitioner. You
presumably want to be able to do range scans (or you'd use
RandomPartitioner), but you also want your data to be spread out.

What I've got planned is to add a small hex hash value of the customer id to
the beginning of the key (which I don't need to range scan), then add the
rest of the key (which I do need to range scan). That means I can still
range scan (e.g.) within 1 customer's data, but the customers will be spread
out more evenly between the nodes.

Mark

Re: Cassandra data distribution and configuration settings

Reply via email to