On Wed, Aug 19, 2009 at 6:51 PM, Brian Frank
Cooper<[email protected]> wrote:
> We are using "RandomPartitioner." However, I have noticed that some of the 
> boxes have significantly more data (in /var/cassandra/data and 
> /var/cassandra/commitlog) than others (like 30 X more).

Ah, with small numbers of nodes you should manually space tokens
around the ring instead of having them pick one randomly.  You can do
this w/ the InitialToken directive before starting the node *for the
first time* (afterwards it stores it in the system keyspace, under
data/).  The digg guys should have a utility done soon to set it
post-start but that is the only option for now.

> Incidentally, the system is quite fun to play with, and the startup is very 
> easy (just start the nodes and they all find each other.) Writing the client 
> (e.g. dealing with thrift) was much harder.

Yeah, thrift is a pain.  It's the worst possible option except for all
the others. :)  (E.g. protocol buffers doesn't do RPC; avro only does
java/c/python, ...)  That's why you have more idiomatic clients for
python, ruby, scala, at the least.

> I wonder whether a lot of users had tried to write C++ clients

I think you're the first.  Why put yourself through that kind of pain
just to test things out? :)

-Jonathan

Reply via email to