Greetings,

I'm evaluating Cassandra, like others. I've scrubbed through the mail digest and blog posts and whatnot, and I've seen my question asked but I'm not clear on the answers.

I'm doing what others have done: using 3 servers and doing a few test inserts to understand the data and consistency model.

Question 1:
   the bootstrap parameter: what does it do, exactly?
It seems the right thing to do, just playing around, is to start the first node with no bootstrap, and the other two with bootstrap.
   But I don't know the hows or whys.

Question 2:
   "how eventual is eventual?"
   Imagine the following case:
Defaults from storage-conf.xml + replication count 2 (and the IP addresses required, etc)
      Up server A (no -b)
      Insert a few values, read, all is good (using _cli)
      Up server B, C (with -b)
read values from A, B, or C - all is good, appears to be reading from A
      wait a few minutes - servers appear quiescent.
      Down server A
read values from B - values are not available (NPE exception on server & _cli interface)

So I read that Cassandra doesn't optimistically replicate, so I understand in theory that the data inserted to A shouldn't replicate. I believe if I used the proper thrift inteface and asked for replication count 2, the transaction would have failed. Yet, I expect that if I asked for replication count 2, I should get it. At some point. Eventually. The data has been inserted. I expect the cluster to work toward replication count 2 regardless of the current state of the cluster --- is there a way to achieve this behavior?

Question 3:
   "balancing"
      This question is similar to question 2, from a different way.
I have three nodes which I brought up at the dawn of time. They've taken a lot of inserts, and have 1T each. Let's say the load now is mostly reads, as the data has already been inserted
      I bring up a fourth node.
Clients (aka app servers) are pointing at the first 3 nodes. I have to reconfigure those servers to start using the 4th server, right? New writes may take advantage of the 4th server, but no data will automatically move? Which would mean that the servers would be out of balance, perhaps for a long time, perhaps forever?

Thanks for the hints - I'm clearly not "getting" Cassandra yet and don't want to foolishly misrepresent it.

Thanks,
-brianb


Reply via email to