Hey all,

 OK I gave removing the downed node from the cassandra ring another try.

To recap what's going on, this is what my ring looks like with nodetool
status:

[root@beta-new:~] #nodetool status

Datacenter: datacenter1

=======================

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address         Load       Tokens  Owns   Host ID
        Rack

UN  10.10.1.94  178.38 KB  256     49.4%
fd2f76ae-8dcf-4e93-a37f-bf1e9088696e  rack1

DN  10.10.1.98     ?          256     50.6%
f2a48fc7-a362-43f5-9061-4bb3739fdeaf  rack1

So I followed the steps in this document one more time:

http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html

And setup the following in the cassandra.yaml according to the above
instructions:

cluster_name: ‘Test Cluster'

num_tokens: 256

seed_provider:

listen_address: 10.10.1.153

auto_bootstrap: yes

broadcast_address: 10.10.1.153

endpoint_snitch: SimpleSnitch

initial_token: -9173731940639284976


The initial_token is the one belonging to the dead node that I'm trying to
get rid of.

I then make sure that the /var/lib/casssandra directory is completely empty
and run this startup command:

[root@cassandra1 cassandrahome]# ./bin/cassandra
-Dcassandra.replace_address=10.10.1.98 -f

Using the IP of the node I want to remove as the value to
casandra_replace_address

And when I do this is the error I get:

java.lang.RuntimeException: Cannot replace_address /10.10.1.98 because it
doesn't exist in gossip


So how can I get cassandra to realize that this node needs to be replaced
and that it SHOULDN'T exist in gossip because the node is down? That would
seem obvious to me, so why isn't it obvious to her? :)


Thanks

Tim









On Wed, Jun 4, 2014 at 4:36 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Jun 3, 2014 at 9:03 PM, Matthew Allen <matthew.j.al...@gmail.com>
> wrote:
>
>> Thanks Robert, this makes perfect sense.  Do you know if CASSANDRA-6961
>> will be ported to 1.2.x ?
>>
>
> I just asked driftx, he said "not gonna happen."
>
>
>> And apologies if these appear to be dumb questions, but is a repair more
>> suitable than a rebuild because the rebuild only contacts 1 replica (per
>> range), which may itself contain stale data ?
>>
>
> Exactly that.
>
> https://issues.apache.org/jira/browse/CASSANDRA-2434
>
> Discusses related issues in quite some detail. The tl;dr is that until
> 2434 is resolved, streams do not necessarily come from the node departing
> the range, and therefore the "unique replica count" is decreased by
> changing cluster topology.
>
> =Rob
>

Reply via email to