There was no need to assassinate in this case. 'nodetool removenode' worked fine (didn't want to risk losing data). I just don't follow the logic described by the logs.
On Friday, January 6, 2017 5:45 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote: On Fri, Jan 6, 2017 at 6:45 PM, Sotirios Delimanolis <sotodel...@yahoo.com> wrote: I forgot to check nodetool gossipinfo. Still, why does the first check think that the address exists, but the second doesn't? On Friday, January 6, 2017 1:11 PM, David Berry <dbe...@blackberry.com> wrote: I’ve encountered this previously where after removing a node, gossip info is retained for 72 hours which doesn’t allow the IP to be reused during that period. You can check how long gossip will retain this information using “nodetool gossipinfo” where the epoch time will be shown with status For example…. Nodetool gossipinfo /10.236.70.199 generation:1482436691 heartbeat:3942407 STATUS:3942404:LEFT, 3074457345618261000,1483995662 276 LOAD:3942267:3.60685807E8 SCHEMA:223625:acbf0adb-1bbe- 384a-acd7-6a46609497f1 DC:20:orion RACK:22:r1 RELEASE_VERSION:4:2.1.16 RPC_ADDRESS:3:10.236.70.199 SEVERITY:3942406:0. 25094103813171387 NET_VERSION:1:8 HOST_ID:2:cd2a767f-3716-4717- 9106-52f0380e6184 TOKENS:15:<hidden> Converting it from epoch….. local@img2116saturn101:~$ date -d @$((1483995662276/1000)) Mon Jan 9 21:01:02 UTC 2017 At the time we waited the 72 hour period before reusing the IP, I’ve not used replace_address previously. From: Sotirios Delimanolis [mailto:sotodel...@yahoo.com] Sent: Friday, January 6, 2017 2:38 PM To: User <user@cassandra.apache.org> Subject: Logs appear to contradict themselves during bootstrap steps We had a node go down in our cluster and its disk had to be wiped. During that time, all nodes in the cluster have restarted at least once. We want to add the bad node back to the ring. It has the same IP/hostname. I follow the steps here for "Adding nodes to an existing cluster." When the process is started up, it reports A node with address <hostname>/<address> already exists, cancelling join. Use cassandra.replace_address if you want to replace this node. I found this error message in theStorageService using theGossiper instance to look up the node's state. Apparently, the node knows about it. So I followed the instructions and added thecassandra.replace_address system property and restarted the process. But it reports Cannot replace_address /<address> because it doesn't exist in gossip So which one is it? Does the ring know about it or not? Running "nodetool ring" does show it on all other nodes. I've seen CASSANDRA-8138 andthe conditions are the same, but I can't understand why it thinks it's not part of gossip. What's the difference between the gossip check used to make this determination and the gossip check used for the first error message? Can someone explain? I've since retrieved the node's id and used it to "nodetool removenode". After rebalancing, I added the node back and "nodetool cleaned" up. Everything's up and running, but I'd like to understand what Cassandra was doing. In case you have not seen check out http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsAssassinate.html this is what you too when you really want something to go away from gossip.