[
https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295525#comment-14295525
]
Ryan Springer commented on CASSANDRA-8072:
------------------------------------------
Actually, I was wrong about Opscenter waiting until all of the nodes had
installed the DSC package before stopping the nodes. As soon as the package
install has finished for a node, Opscenter will proceed and stop the running
DSC instance for that node without waiting for other nodes to finish their
installs. Opscenter will also reconfigure nodes without waiting for all nodes
to be stopped.
It looks like the node install -> stop -> configure process runs independently
for each node. So we could theoretically have one node in a reconfigured but
stopped state while another node is still installing the packages. Opscenter
will, however, wait until all nodes have been reconfigured before beginning to
restarting DSC on the nodes.
> Exception during startup: Unable to gossip with any seeds
> ---------------------------------------------------------
>
> Key: CASSANDRA-8072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8072
> Project: Cassandra
> Issue Type: Bug
> Reporter: Ryan Springer
> Assignee: Brandon Williams
> Attachments: casandra-system-log-with-assert-patch.log
>
>
> When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster
> in either ec2 or locally, an error occurs sometimes with one of the nodes
> refusing to start C*. The error in the /var/log/cassandra/system.log is:
> ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513)
> Exception encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200)
> at
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444)
> at
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655)
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:609)
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
> INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java
> (line 1279) Announcing shutdown
> INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326
> MessagingService.java (line 701) Waiting for messaging service to quiesce
> INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327
> MessagingService.java (line 941) MessagingService has terminated the accept()
> thread
> This errors does not always occur when provisioning a 2-node cluster, but
> probably around half of the time on only one of the nodes. I haven't been
> able to reproduce this error with DSC 2.0.9, and there have been no code or
> definition file changes in Opscenter.
> I can reproduce locally with the above steps. I'm happy to test any proposed
> fixes since I'm the only person able to reproduce reliably so far.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)