On Wed, Aug 19, 2009 at 6:51 PM, Brian Frank Cooper<[email protected]> wrote: > We are using "RandomPartitioner." However, I have noticed that some of the > boxes have significantly more data (in /var/cassandra/data and > /var/cassandra/commitlog) than others (like 30 X more).
Ah, with small numbers of nodes you should manually space tokens around the ring instead of having them pick one randomly. You can do this w/ the InitialToken directive before starting the node *for the first time* (afterwards it stores it in the system keyspace, under data/). The digg guys should have a utility done soon to set it post-start but that is the only option for now. > Incidentally, the system is quite fun to play with, and the startup is very > easy (just start the nodes and they all find each other.) Writing the client > (e.g. dealing with thrift) was much harder. Yeah, thrift is a pain. It's the worst possible option except for all the others. :) (E.g. protocol buffers doesn't do RPC; avro only does java/c/python, ...) That's why you have more idiomatic clients for python, ruby, scala, at the least. > I wonder whether a lot of users had tried to write C++ clients I think you're the first. Why put yourself through that kind of pain just to test things out? :) -Jonathan
