[jira] [Commented] (CASSANDRA-8274) Node fails to rejoin cluster on EC2 if private IP is changed

Chris mildebrandt (JIRA) Fri, 22 Sep 2017 21:37:44 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16177493#comment-16177493
 ]


Chris mildebrandt commented on CASSANDRA-8274:
----------------------------------------------

I just hit this issue today with the 3.11.0 docker image running in kubernetes. 
I had 4 nodes in the cassandra cluster, two members were restarted and can't 
rejoin. There's one seed that is up and reachable from all the other 
containers, and one other member that is able to join. The first exception I 
see is this:
{{java.lang.RuntimeException: Cache schema version 
38e97a53-563b-3074-b86f-c81efa980524 does not match current schema version 
1bfdabae-743e-357e-a661-93984c26bc32
        at 
org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:206) 
~[apache-cassandra-3.11.0.jar:3.11.0]
        at 
org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:164) 
[apache-cassandra-3.11.0.jar:3.11.0]
        at 
org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:160) 
[apache-cassandra-3.11.0.jar:3.11.0]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[na:1.8.0_131]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_131]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]}}

Then I see the one related to this issue:
{{java.lang.RuntimeException: Unable to gossip with any seeds
        at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1413) 
~[apache-cassandra-3.11.0.jar:3.11.0]
        at 
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:550)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
        at 
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:801)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
        at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:666) 
~[apache-cassandra-3.11.0.jar:3.11.0]
        at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:612) 
~[apache-cassandra-3.11.0.jar:3.11.0]
        at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:393) 
[apache-cassandra-3.11.0.jar:3.11.0]
        at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:600) 
[apache-cassandra-3.11.0.jar:3.11.0]
        at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:689) 
[apache-cassandra-3.11.0.jar:3.11.0]}}

Restarting the nodes didn't help. nodetool status is now reporting only two 
nodes, and nodetool gossipinfo has three "empty" entries:

{{/100.96.3.164
  generation:0
  heartbeat:0
  TOKENS: not present
/100.96.1.7
  generation:0
  heartbeat:0
  TOKENS: not present
/100.96.2.170
  generation:0
  heartbeat:0
  TOKENS: not present}}


> Node fails to rejoin cluster on EC2 if private IP is changed
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-8274
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8274
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Distributed Metadata
>         Environment: Amazon EC2
>            Reporter: Joseph Clark
>            Priority: Minor
>             Fix For: 3.11.x
>
>
> Nodes in Amazon AWS EC2 Classic (not a VPC) may be assigned a new private IP 
> if the node is stopped and then started again. In this case we have puppet 
> update the configured listen_address to the new private IP. However, once the 
> cassandra service starts, it is unable to communicate with the existing 
> nodes(single region) and vice versa.
> 'nodetool status' shows that each node believes that it is 'UN' and the other 
> node is 'DN'.
> 'nodetool gossipinfo' on the node that remained running shows the *old* 
> private IP listed as the 'INTERNAL_IP' of the node that was stopped and 
> restarted. 
> The situation is resolved by restarting the cassandra service on the node 
> that remained running. Once it has restarted, the INTERNAL_IP is correctly 
> updated to the new private IP. 'nodetool status' shows that both nodes are up 
> and the cluster appears to function normally.
> This appears to me to be the root cause of 
> https://issues.apache.org/jira/browse/CASSANDRA-7292. -Possibly 
> https://issues.apache.org/jira/browse/CASSANDRA-8072 as well, but I am not 
> convinced they are actually duplicates.-



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-8274) Node fails to rejoin cluster on EC2 if private IP is changed

Reply via email to