[ 
https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294364#comment-14294364
 ] 

Russ Hatch commented on CASSANDRA-8072:
---------------------------------------

I have been able to reproduce this issue using the steps provided in 
CASSANDRA-8422.

During one repro attempt I tried to run 'nodetool gossipinfo' in a loop on the 
non-seed node (just before the reported exception was expected to occur), and 
was surprised to see it complain about attempting to use a closed connection.

Using netstat I had a look at the seed and the non-seed node, and can see a 
lingering CLOSE_WAIT connection on the seed node -- I'm wondering if cassandra 
could somehow be trying to reuse this stale connection, making the seed unable 
to connect back to the non-seed (and making it think the seed is unavailable).

It may also be relevant that the non-seed node has a connection in state 
FIN_WAIT2 for approx. 10-15 seconds after stopping the cassandra process.

> Exception during startup: Unable to gossip with any seeds
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-8072
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8072
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Ryan Springer
>            Assignee: Brandon Williams
>         Attachments: casandra-system-log-with-assert-patch.log
>
>
> When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster 
> in either ec2 or locally, an error occurs sometimes with one of the nodes 
> refusing to start C*.  The error in the /var/log/cassandra/system.log is:
> ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) 
> Exception encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
>         at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200)
>         at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444)
>         at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655)
>         at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:609)
>         at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
>         at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
>         at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
>         at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java 
> (line 1279) Announcing shutdown
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 
> MessagingService.java (line 701) Waiting for messaging service to quiesce
>  INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 
> MessagingService.java (line 941) MessagingService has terminated the accept() 
> thread
> This errors does not always occur when provisioning a 2-node cluster, but 
> probably around half of the time on only one of the nodes.  I haven't been 
> able to reproduce this error with DSC 2.0.9, and there have been no code or 
> definition file changes in Opscenter.
> I can reproduce locally with the above steps.  I'm happy to test any proposed 
> fixes since I'm the only person able to reproduce reliably so far.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to