On Tue, Oct 13, 2009 at 4:52 PM, Brian Bulkowski <[email protected]> wrote: > Question 1: > the bootstrap parameter: what does it do, exactly?
It's for adding nodes to an existing cluster. (This is being reworked to be more automatic for 0.5.) If you start a node without it, it assumes it already has the data that's "supposed" to be on it (either b/c you are starting a new cluster, or restarting an existing node in one). If you do specify -b it will contact the cluster you're starting it in to move the right data to it. > Question 2: > "how eventual is eventual?" > Imagine the following case: > Defaults from storage-conf.xml + replication count 2 (and the IP > addresses required, etc) > Up server A (no -b) > Insert a few values, read, all is good (using _cli) > Up server B, C (with -b) > read values from A, B, or C - all is good, appears to be reading from A > wait a few minutes - servers appear quiescent. > Down server A > read values from B - values are not available (NPE exception on server > & _cli interface) > > So I read that Cassandra doesn't optimistically replicate, so I understand > in theory that the data inserted to A shouldn't replicate. > I believe if I used the proper thrift inteface and asked for replication > count 2, the transaction would have failed. > Yet, I expect that if I asked for replication count 2, I should get it. At > some point. Eventually. The data has been inserted. > I expect the cluster to work toward replication count 2 regardless of the > current state of the cluster --- is there a way to achieve this behavior? There's a couple things going on here. The big one is that after a fresh start A doesn't know what other nodes "should" be part of the cluster. Cassandra does assume that you reach a "good" state before bad stuff starts happening (in production this turns out to be quite reasonable). So what you need to do is bring up A/B/C, then turn things off, rather than just bring up A by itself to start with. The second one is that there's a different time scale for "eventually" between "eventually, all the replicas are in sync" and "eventually, the failure detector will notice that a node is down and not route requests to it." The first is in ms (with a few caveats mostly the same as in the Dynamo paper), the second is in seconds (a handful for a small cluster, maybe 15 for a large one). > Question 3: > "balancing" > This question is similar to question 2, from a different way. > I have three nodes which I brought up at the dawn of time. They've > taken a lot of inserts, and have 1T each. > Let's say the load now is mostly reads, as the data has already been > inserted > I bring up a fourth node. > Clients (aka app servers) are pointing at the first 3 nodes. I have to > reconfigure those servers to start using the 4th server, right? Depends on your infrastructure. I'm a fan of round-robin DNS. Or you can use a sw or hw load balancer. Or you can ask the cluster what machines are in it and manually balance from your client app, but that is my least favorite option. > New writes may take advantage of the 4th server, but no data will > automatically move? If you specify -b when starting the 4th node, the right data will be copied to it. (Then, you need to manually tell the other nodes to "cleanup" -- remove data that doesn't belong on them. This is not automatic since if the nodes are "missing" as in the answer to #2 you can shoot yourself in the foot here.) > Thanks for the hints - I'm clearly not "getting" Cassandra yet and don't > want to foolishly misrepresent it. These are not dumb questions. Carry on. :) -Jonathan
