[ 
https://issues.apache.org/jira/browse/CASSANDRA-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-6615:
----------------------------------------

    Attachment: 6615.txt

Host ID conflicts are roughly as important as token conflicts, and need to be 
handled the same way, decisively.  We can decide who has won a host id 
conflict, much like we do a token conflict.  Once the loser is removed from tMD 
we can just let the FD mark it dead and then it will be evicted as a fat client 
(which is how it worked before we added host IDs.)  However, post-4375 this 
will take quite a while, since the only sample the FD has is the seed value of 
30s.  While this is actually ok as long as we've removed it from tMD, we can do 
better, so we call removeEndpoint, which in turn removes it from the FD, but 
doesn't mark the epstate as dead.  isFatClient began checking if the epstate 
was dead in CASSANDRA-5378, but this doesn't seem necessary since the timestamp 
is updated if the node is actually alive, and the duration check will prevent 
it from being expired, so this patch removes it.

One small bit of nuance here is that if the host IDs conflict and the loser is 
in tMD, then the token conflict check is basically useless, since we have to 
update the host ID before the tokens, and the token check relies on data in 
tMD.  This means if a host ID conflict occurs where the tokens are different, 
the loser's tokens may just vanish, but that's highly unlikely to occur without 
hand-editing the system table or crafting one specifically for this.

> Changing the IP of a node on a live cluster leaves gossip infos and throws 
> Exceptions
> -------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6615
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6615
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Fabien Rousseau
>            Assignee: Brandon Williams
>             Fix For: 1.2.14
>
>         Attachments: 6615.txt
>
>
> Following this procedure : 
> https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/
>   to change the IP of a node, we encountered an issue :
>  - logs contains: "java.lang.RuntimeException: Host ID collision between 
> active endpoint /127.0.0.5 and /127.0.0.3"
>  - logs also indicate that the old IP is being removed of the cluster 
> (FatClient timeout), then added again...
>  - nodetool gossipinfo still list old IP (even a few hours after...)
>  - the old IP is still seen as "UP" in the cluster... (according to the 
> logs...)
> Below is a small shell script which allows to reproduce the scenario...
> {noformat}
> #! /bin/bash
> CLUSTER=$1
> ccm create $CLUSTER --cassandra-dir=.
> ccm populate -n 2
> ccm start
> ccm add node3 -i 127.0.0.3 -j 7300 -b
> ccm node3 start
> ccm node3 ring
> ccm node3 stop
> sed -i 's/127.0.0.3/127.0.0.5/g' ~/.ccm/$CLUSTER/node3/node.conf 
> sed -i 's/127.0.0.3/127.0.0.5/g' ~/.ccm/$CLUSTER/node3/conf/cassandra.yaml
> ccm node3 start
> sleep 3
> nodetool --host 127.0.0.5 --port 7300 gossipinfo
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to