Logs appear to contradict themselves during bootstrap steps

Sotirios Delimanolis Fri, 06 Jan 2017 11:38:03 -0800

We had a node go down in our cluster and its disk had to be wiped. During that 
time, all nodes in the cluster have restarted at least once.
We want to add the bad node back to the ring. It has the same IP/hostname. I 
follow the steps here for "Adding nodes to an existing cluster."
When the process is started up, it reports
A node with address <hostname>/<address> already exists, cancelling join. Use 
cassandra.replace_address if you want to replace this node.


I found this error message in the StorageService using the Gossiper instance to 
look up the node's state. Apparently, the node knows about it. So I followed 
the instructions and added the cassandra.replace_address system property and 
restarted the process.
But it reports
Cannot replace_address /<address> because it doesn't exist in gossip
So which one is it? Does the ring know about it or not? Running "nodetool ring" 
does show it on all other nodes.
I've seen CASSANDRA-8138 andthe conditions are the same, but I can't understand 
why it thinks it's not part of gossip. What's the difference between the gossip 
check used to make this determination and the gossip check used for the first 
error message? Can someone explain?
I've since retrieved the node's id and used it to "nodetool removenode". After 
rebalancing, I added the node back and "nodetool cleaned" up. Everything's up 
and running, but I'd like to understand what Cassandra was doing.

Logs appear to contradict themselves during bootstrap steps

Reply via email to