Re: eventual consistency question

Jonathan Ellis Tue, 13 Oct 2009 16:27:27 -0700

On Tue, Oct 13, 2009 at 4:52 PM, Brian Bulkowski <[email protected]> wrote:
> Question 1:
>   the bootstrap parameter: what does it do, exactly?


It's for adding nodes to an existing cluster.  (This is being reworked
to be more automatic for 0.5.)  If you start a node without it, it
assumes it already has the data that's "supposed" to be on it (either
b/c you are starting a new cluster, or restarting an existing node in
one).  If you do specify -b it will contact the cluster you're
starting it in to move the right data to it.

> Question 2:
>   "how eventual is eventual?"
>   Imagine the following case:
>      Defaults from storage-conf.xml + replication count 2 (and the IP
> addresses required, etc)
>      Up server A (no -b)
>      Insert a few values, read, all is good (using _cli)
>      Up server B, C (with -b)
>      read values from A, B, or C - all is good, appears to be reading from A
>      wait a few minutes - servers appear quiescent.
>      Down server A
>      read values from B - values are not available (NPE exception on server
> & _cli interface)
>
> So I read that Cassandra doesn't optimistically replicate, so I understand
> in theory that the data inserted to A shouldn't replicate.
> I believe if I used the proper thrift inteface and asked for replication
> count 2, the transaction would have failed.
> Yet, I expect that if I asked for replication count 2, I should get it. At
> some point. Eventually. The data has been inserted.
> I expect the cluster to work toward replication count 2 regardless of the
> current state of the cluster --- is there a way to achieve this behavior?

There's a couple things going on here.

The big one is that after a fresh start A doesn't know what other
nodes "should" be part of the cluster.  Cassandra does assume that you
reach a "good" state before bad stuff starts happening (in production
this turns out to be quite reasonable).  So what you need to do is
bring up A/B/C, then turn things off, rather than just bring up A by
itself to start with.

The second one is that there's a different time scale for "eventually"
between "eventually, all the replicas are in sync" and "eventually,
the failure detector will notice that a node is down and not route
requests to it."  The first is in ms (with a few caveats mostly the
same as in the Dynamo paper), the second is in seconds (a handful for
a small cluster, maybe 15 for a large one).

> Question 3:
>   "balancing"
>      This question is similar to question 2, from a different way.
>      I have three nodes which I brought up at the dawn of time. They've
> taken a lot of inserts, and have 1T each.
>      Let's say the load now is mostly reads, as the data has already been
> inserted
>      I bring up a fourth node.
>      Clients (aka app servers) are pointing at the first 3 nodes. I have to
> reconfigure those servers to start using the 4th server, right?

Depends on your infrastructure.  I'm a fan of round-robin DNS.  Or you
can use a sw or hw load balancer.  Or you can ask the cluster what
machines are in it and manually balance from your client app, but that
is my least favorite option.

>      New writes may take advantage of the 4th server, but no data will
> automatically move?

If you specify -b when starting the 4th node, the right data will be
copied to it.  (Then, you need to manually tell the other nodes to
"cleanup" -- remove data that doesn't belong on them.  This is not
automatic since if the nodes are "missing" as in the answer to #2 you
can shoot yourself in the foot here.)

> Thanks for the hints - I'm clearly not "getting" Cassandra yet and don't
> want to foolishly misrepresent it.

These are not dumb questions.  Carry on. :)

-Jonathan

Re: eventual consistency question

Reply via email to