Greetings,
I'm evaluating Cassandra, like others. I've scrubbed through the mail
digest and blog posts and whatnot, and I've seen my question asked but
I'm not clear on the answers.
I'm doing what others have done: using 3 servers and doing a few test
inserts to understand the data and consistency model.
Question 1:
the bootstrap parameter: what does it do, exactly?
It seems the right thing to do, just playing around, is to start the
first node with no bootstrap, and the other two with bootstrap.
But I don't know the hows or whys.
Question 2:
"how eventual is eventual?"
Imagine the following case:
Defaults from storage-conf.xml + replication count 2 (and the IP
addresses required, etc)
Up server A (no -b)
Insert a few values, read, all is good (using _cli)
Up server B, C (with -b)
read values from A, B, or C - all is good, appears to be reading
from A
wait a few minutes - servers appear quiescent.
Down server A
read values from B - values are not available (NPE exception on
server & _cli interface)
So I read that Cassandra doesn't optimistically replicate, so I
understand in theory that the data inserted to A shouldn't replicate.
I believe if I used the proper thrift inteface and asked for replication
count 2, the transaction would have failed.
Yet, I expect that if I asked for replication count 2, I should get it.
At some point. Eventually. The data has been inserted.
I expect the cluster to work toward replication count 2 regardless of
the current state of the cluster --- is there a way to achieve this
behavior?
Question 3:
"balancing"
This question is similar to question 2, from a different way.
I have three nodes which I brought up at the dawn of time.
They've taken a lot of inserts, and have 1T each.
Let's say the load now is mostly reads, as the data has already
been inserted
I bring up a fourth node.
Clients (aka app servers) are pointing at the first 3 nodes. I
have to reconfigure those servers to start using the 4th server, right?
New writes may take advantage of the 4th server, but no data will
automatically move?
Which would mean that the servers would be out of balance,
perhaps for a long time, perhaps forever?
Thanks for the hints - I'm clearly not "getting" Cassandra yet and don't
want to foolishly misrepresent it.
Thanks,
-brianb