Re: Working backwards from production to staging/dev

ian douglas Thu, 31 Mar 2011 09:15:54 -0700

Thanks Edward,

Anyone able to provide some answers for the other questions?



On 03/26/2011 07:25 AM, Edward Capriolo wrote:

On Fri, Mar 25, 2011 at 2:11 PM, ian douglas<i...@armorgames.com>  wrote:

On 03/25/2011 10:12 AM, Jonathan Ellis wrote:

On Fri, Mar 25, 2011 at 11:59 AM, ian douglas<i...@armorgames.com>    wrote:

(we're running v0.60)

I don't know if you could hear that from where you are, but our whole
office just yelled, "WTF!" :)

Ah, that's what that noise was... And yeah, we know we're way behind. Our
initial delay in upgrading was waiting for 0.7 to come out and then we
learned we needed a whole new Thrift client for our PHP code base, and then
we got busy on other things, but we're at a point where we have some time to
take care of Cassandra and get it upgraded.

  Our planned path, now, is:

(our nodes' tokens are numbered using the python code (0, 1/3 and 2/3 times
2^127), and called node 1 through 3, respectively; our RF is set to 2 right
now)

1. remove node 1 from our software
2. bring node 1 offline after a flush/repair/cleanup
3. run a cleanup on node 2 and then on node 3 so they have a full copy of
all data from the old node 1 and each other.
4. bring up a new Large 64-bit instance, install 0.6.12, assign a Token
value of 0 (node 1), RF:2, on a new gossip ring, and copy all data from the
32-bit nodes 2 and 3 and run a repair/cleanup to remove any duplicated data
5. remove node 3 from our software
6. point our code to the new 64-bit node 1
7. bring node 3 offline after a flush/repair/cleanup so node 2 has the last
fresh copy of everything
8. bring node 2 offline after a flush/repair/cleanup
9. bring up another Large instance, get a copy of all data from our old node
2, assign a Token value of (1/2 * 2^127), RF:2, on the new gossip ring, run
a repair to remove duplicate data, and then a cleanup so it gets replicated
data from the new node 1
10. add the new node 2 to our software
11. run a final cleanup on the new node 1 and then on node 2 to make sure
all data is replicated evenly on both nodes

... at this point, we should have two 64-bit Large instances, with RF:2, on
a new gossip ring, replacing three 32-bit systems, with minimal down time
and no data loss (just a data delay between steps 6 and 10 above).

Questions:
1. Does it appear that we've missed any steps, or doing something out of
order?
2. Is the flush/repair/cleanup overkill when bringing the old nodes offline,
or is that the correct sequence to follow?
3. Will the difference in compute units (lower on Large instances than
Medium instances) make any noticeable difference, or will the fact that the
machine is 64-bit handle things efficiently enough such that a Large
instance works harder than a Medium instance? (never did figure out their
how their compute units work)
4. Can we follow similar steps when we're ready to upgrade to 0.7x and have
our new Thrift client for PHP all squared away?


Thanks again for the help!!!

If you have a node with an old column family you are not using
anymore...Stop node...delete data...start node.

Edward

Re: Working backwards from production to staging/dev

Reply via email to