On Fri, Jan 6, 2017 at 6:45 PM, Sotirios Delimanolis <sotodel...@yahoo.com> wrote:
> I forgot to check nodetool gossipinfo. Still, why does the first check > think that the address exists, but the second doesn't? > > > On Friday, January 6, 2017 1:11 PM, David Berry <dbe...@blackberry.com> > wrote: > > > I’ve encountered this previously where after removing a node, gossip info > is retained for 72 hours which doesn’t allow the IP to be reused during > that period. You can check how long gossip will retain this information > using “nodetool gossipinfo” where the epoch time will be shown with status > > For example…. > > Nodetool gossipinfo > > /10.236.70.199 > generation:1482436691 > heartbeat:3942407 > STATUS:3942404:LEFT,3074457345618261000,1483995662276 > LOAD:3942267:3.60685807E8 > SCHEMA:223625:acbf0adb-1bbe-384a-acd7-6a46609497f1 > DC:20:orion > RACK:22:r1 > RELEASE_VERSION:4:2.1.16 > RPC_ADDRESS:3:10.236.70.199 > SEVERITY:3942406:0.25094103813171387 > NET_VERSION:1:8 > HOST_ID:2:cd2a767f-3716-4717-9106-52f0380e6184 > TOKENS:15:<hidden> > > Converting it from epoch….. > > local@img2116saturn101:~$ date -d @$((1483995662276/1000)) > Mon Jan 9 21:01:02 UTC 2017 > > At the time we waited the 72 hour period before reusing the IP, I’ve not > used replace_address previously. > > > *From:* Sotirios Delimanolis [mailto:sotodel...@yahoo.com] > *Sent:* Friday, January 6, 2017 2:38 PM > *To:* User <user@cassandra.apache.org> > *Subject:* Logs appear to contradict themselves during bootstrap steps > > We had a node go down in our cluster and its disk had to be wiped. During > that time, all nodes in the cluster have restarted at least once. > > We want to add the bad node back to the ring. It has the same IP/hostname. > I follow the steps here > <https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_node_to_cluster_t.html> > for > "Adding nodes to an existing cluster." > > When the process is started up, it reports > > A node with address <hostname>/<address> already exists, cancelling join. > Use cassandra.replace_address if you want to replace this node. > > I found this error message in the StorageService using the Gossiper > instance to look up the node's state. Apparently, the node knows about it. > So I followed the instructions and added the cassandra.replace_address > system property and restarted the process. > > But it reports > > Cannot replace_address /<address> because it doesn't exist in gossip > > So which one is it? Does the ring know about it or not? Running "nodetool > ring" does show it on all other nodes. > > I've seen CASSANDRA-8138 > <https://issues.apache.org/jira/browse/CASSANDRA-8138> andthe conditions > are the same, but I can't understand why it thinks it's not part of gossip. > What's the difference between the gossip check used to make this > determination and the gossip check used for the first error message? Can > someone explain? > > I've since retrieved the node's id and used it to "nodetool removenode". > After rebalancing, I added the node back and "nodetool cleaned" up. > Everything's up and running, but I'd like to understand what Cassandra was > doing. > > > > > > In case you have not seen check out http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsAssassinate.html this is what you too when you really want something to go away from gossip.