Hey all, OK I gave removing the downed node from the cassandra ring another try.
To recap what's going on, this is what my ring looks like with nodetool status: [root@beta-new:~] #nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.10.1.94 178.38 KB 256 49.4% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1 DN 10.10.1.98 ? 256 50.6% f2a48fc7-a362-43f5-9061-4bb3739fdeaf rack1 So I followed the steps in this document one more time: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html And setup the following in the cassandra.yaml according to the above instructions: cluster_name: ‘Test Cluster' num_tokens: 256 seed_provider: listen_address: 10.10.1.153 auto_bootstrap: yes broadcast_address: 10.10.1.153 endpoint_snitch: SimpleSnitch initial_token: -9173731940639284976 The initial_token is the one belonging to the dead node that I'm trying to get rid of. I then make sure that the /var/lib/casssandra directory is completely empty and run this startup command: [root@cassandra1 cassandrahome]# ./bin/cassandra -Dcassandra.replace_address=10.10.1.98 -f Using the IP of the node I want to remove as the value to casandra_replace_address And when I do this is the error I get: java.lang.RuntimeException: Cannot replace_address /10.10.1.98 because it doesn't exist in gossip So how can I get cassandra to realize that this node needs to be replaced and that it SHOULDN'T exist in gossip because the node is down? That would seem obvious to me, so why isn't it obvious to her? :) Thanks Tim On Wed, Jun 4, 2014 at 4:36 PM, Robert Coli <rc...@eventbrite.com> wrote: > On Tue, Jun 3, 2014 at 9:03 PM, Matthew Allen <matthew.j.al...@gmail.com> > wrote: > >> Thanks Robert, this makes perfect sense. Do you know if CASSANDRA-6961 >> will be ported to 1.2.x ? >> > > I just asked driftx, he said "not gonna happen." > > >> And apologies if these appear to be dumb questions, but is a repair more >> suitable than a rebuild because the rebuild only contacts 1 replica (per >> range), which may itself contain stale data ? >> > > Exactly that. > > https://issues.apache.org/jira/browse/CASSANDRA-2434 > > Discusses related issues in quite some detail. The tl;dr is that until > 2434 is resolved, streams do not necessarily come from the node departing > the range, and therefore the "unique replica count" is decreased by > changing cluster topology. > > =Rob >