2009/11/17 Richard Grossman <[email protected]> > Ho do I evaluate the value I need to put here ?? > The second point is that I've many column family each with a different key > then how do I know what is the token to distribute the data ?? >
It's not automatic at the moment. If you leave it to make its own token, it'll make a token randomly in the character range it uses (I think 0-9a-zA-Z ). This is not ideal if you're using (say lowercase) hex keys. The only solution for now is to specify your own tokens. For 0.5 it seems likely that adding new nodes will automatically load balance, and auto-bootstrap, so the best strategy would be to start with just one or two nodes, then load a small sample of data before bootstrapping the remaining ones. If you know your keys will (start with or) be a hex number, then just set the tokens to 0,4,8,c (if you have 4 nodes, for example). Or anything really, as long as they're evenly distributed. Choosing keys correctly is important for the ordered partitioner. You presumably want to be able to do range scans (or you'd use RandomPartitioner), but you also want your data to be spread out. What I've got planned is to add a small hex hash value of the customer id to the beginning of the key (which I don't need to range scan), then add the rest of the key (which I do need to range scan). That means I can still range scan (e.g.) within 1 customer's data, but the customers will be spread out more evenly between the nodes. Mark
