[
https://issues.apache.org/jira/browse/CASSANDRA-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brandon Williams updated CASSANDRA-6615:
----------------------------------------
Attachment: 6615.txt
Host ID conflicts are roughly as important as token conflicts, and need to be
handled the same way, decisively. We can decide who has won a host id
conflict, much like we do a token conflict. Once the loser is removed from tMD
we can just let the FD mark it dead and then it will be evicted as a fat client
(which is how it worked before we added host IDs.) However, post-4375 this
will take quite a while, since the only sample the FD has is the seed value of
30s. While this is actually ok as long as we've removed it from tMD, we can do
better, so we call removeEndpoint, which in turn removes it from the FD, but
doesn't mark the epstate as dead. isFatClient began checking if the epstate
was dead in CASSANDRA-5378, but this doesn't seem necessary since the timestamp
is updated if the node is actually alive, and the duration check will prevent
it from being expired, so this patch removes it.
One small bit of nuance here is that if the host IDs conflict and the loser is
in tMD, then the token conflict check is basically useless, since we have to
update the host ID before the tokens, and the token check relies on data in
tMD. This means if a host ID conflict occurs where the tokens are different,
the loser's tokens may just vanish, but that's highly unlikely to occur without
hand-editing the system table or crafting one specifically for this.
> Changing the IP of a node on a live cluster leaves gossip infos and throws
> Exceptions
> -------------------------------------------------------------------------------------
>
> Key: CASSANDRA-6615
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6615
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Fabien Rousseau
> Assignee: Brandon Williams
> Fix For: 1.2.14
>
> Attachments: 6615.txt
>
>
> Following this procedure :
> https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/
> to change the IP of a node, we encountered an issue :
> - logs contains: "java.lang.RuntimeException: Host ID collision between
> active endpoint /127.0.0.5 and /127.0.0.3"
> - logs also indicate that the old IP is being removed of the cluster
> (FatClient timeout), then added again...
> - nodetool gossipinfo still list old IP (even a few hours after...)
> - the old IP is still seen as "UP" in the cluster... (according to the
> logs...)
> Below is a small shell script which allows to reproduce the scenario...
> {noformat}
> #! /bin/bash
> CLUSTER=$1
> ccm create $CLUSTER --cassandra-dir=.
> ccm populate -n 2
> ccm start
> ccm add node3 -i 127.0.0.3 -j 7300 -b
> ccm node3 start
> ccm node3 ring
> ccm node3 stop
> sed -i 's/127.0.0.3/127.0.0.5/g' ~/.ccm/$CLUSTER/node3/node.conf
> sed -i 's/127.0.0.3/127.0.0.5/g' ~/.ccm/$CLUSTER/node3/conf/cassandra.yaml
> ccm node3 start
> sleep 3
> nodetool --host 127.0.0.5 --port 7300 gossipinfo
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)