[
https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294364#comment-14294364
]
Russ Hatch commented on CASSANDRA-8072:
---------------------------------------
I have been able to reproduce this issue using the steps provided in
CASSANDRA-8422.
During one repro attempt I tried to run 'nodetool gossipinfo' in a loop on the
non-seed node (just before the reported exception was expected to occur), and
was surprised to see it complain about attempting to use a closed connection.
Using netstat I had a look at the seed and the non-seed node, and can see a
lingering CLOSE_WAIT connection on the seed node -- I'm wondering if cassandra
could somehow be trying to reuse this stale connection, making the seed unable
to connect back to the non-seed (and making it think the seed is unavailable).
It may also be relevant that the non-seed node has a connection in state
FIN_WAIT2 for approx. 10-15 seconds after stopping the cassandra process.
> Exception during startup: Unable to gossip with any seeds
> ---------------------------------------------------------
>
> Key: CASSANDRA-8072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8072
> Project: Cassandra
> Issue Type: Bug
> Reporter: Ryan Springer
> Assignee: Brandon Williams
> Attachments: casandra-system-log-with-assert-patch.log
>
>
> When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster
> in either ec2 or locally, an error occurs sometimes with one of the nodes
> refusing to start C*. The error in the /var/log/cassandra/system.log is:
> ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513)
> Exception encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200)
> at
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444)
> at
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655)
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:609)
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
> INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java
> (line 1279) Announcing shutdown
> INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326
> MessagingService.java (line 701) Waiting for messaging service to quiesce
> INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327
> MessagingService.java (line 941) MessagingService has terminated the accept()
> thread
> This errors does not always occur when provisioning a 2-node cluster, but
> probably around half of the time on only one of the nodes. I haven't been
> able to reproduce this error with DSC 2.0.9, and there have been no code or
> definition file changes in Opscenter.
> I can reproduce locally with the above steps. I'm happy to test any proposed
> fixes since I'm the only person able to reproduce reliably so far.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)