[
https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14490401#comment-14490401
]
Brandon Williams commented on CASSANDRA-8072:
---------------------------------------------
Thanks for the logs! So, the relevant portion of these logs is here:
{noformat}
DEBUG 20:53:50,981 Starting shadow gossip round to check for endpoint collision
INFO 20:53:51,304 Starting Encrypted Messaging Service on SSL port 7001
INFO 20:53:51,312 Starting Messaging Service on port 7000
INFO 20:53:51,315 Loading settings from file:/etc/cassandra/conf/cassandra.yaml
TRACE 20:53:51,336 /54.219.189.161 sending GOSSIP_DIGEST_SYN to
1@/54.219.189.162
TRACE 20:53:51,353 /54.219.189.161 sending GOSSIP_DIGEST_SYN to
2@/54.219.189.163
DEBUG 20:53:51,354 attempting to connect to /54.219.189.162
TRACE 20:53:51,354 Assuming current protocol version for /54.219.189.162
TRACE 20:53:51,355 Filtering
org.apache.cassandra.db.ColumnFamilyStore$9@496cc7de for rows matching
org.apache.cassandra.db.filter.ExtendedFilter$EmptyClauseFilter@4b5e57b
DEBUG 20:53:51,359 attempting to connect to /54.219.189.163
TRACE 20:53:51,360 Assuming current protocol version for /54.219.189.163
INFO 20:53:51,543 Handshaking version with
cas-dev-dt-01-uw1-cassandra-seed02.localdomain-ext/54.219.189.163
INFO 20:53:51,544 Handshaking version with
cas-dev-dt-01-uw1-cassandra-seed01.localdomain-ext/54.219.189.162
DEBUG 20:53:51,583 Setting version 7 for
cas-dev-dt-01-uw1-cassandra-seed02.localdomain-ext/54.219.189.163
DEBUG 20:53:51,586 Setting version 7 for
cas-dev-dt-01-uw1-cassandra-seed01.localdomain-ext/54.219.189.162
TRACE 20:53:51,586 Upgrading OutputStream to be compressed
TRACE 20:53:51,588 Upgrading OutputStream to be compressed
TRACE 20:53:55,247 Expired 0 entries
DEBUG 20:53:55,598 GC for ConcurrentMarkSweep: 71 ms for 1 collections,
240266176 used; max is 7935623168
TRACE 20:54:00,248 Expired 0 entries
TRACE 20:54:05,248 Expired 0 entries
TRACE 20:54:10,249 Expired 0 entries
TRACE 20:54:15,249 Expired 0 entries
TRACE 20:54:20,250 Expired 0 entries
ERROR 20:54:22,360 Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any seeds
{noformat}
We can see that this node sent the SYN, which caused MS to connect to those
nodes, and it successfully negotiated the version, so we know that everything
worked as expected until this point. What we don't know is why neither seed
replied to the SYN, or if they even attempted to, or just never received the
SYN for some reason. Without TRACE from one of the seeds, we won't be able to
tell.
> Exception during startup: Unable to gossip with any seeds
> ---------------------------------------------------------
>
> Key: CASSANDRA-8072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8072
> Project: Cassandra
> Issue Type: Bug
> Reporter: Ryan Springer
> Assignee: Brandon Williams
> Fix For: 2.0.15, 2.1.5
>
> Attachments: casandra-system-log-with-assert-patch.log,
> trace_logs.tar.bz2
>
>
> When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster
> in either ec2 or locally, an error occurs sometimes with one of the nodes
> refusing to start C*. The error in the /var/log/cassandra/system.log is:
> ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513)
> Exception encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200)
> at
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444)
> at
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655)
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:609)
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
> INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java
> (line 1279) Announcing shutdown
> INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326
> MessagingService.java (line 701) Waiting for messaging service to quiesce
> INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327
> MessagingService.java (line 941) MessagingService has terminated the accept()
> thread
> This errors does not always occur when provisioning a 2-node cluster, but
> probably around half of the time on only one of the nodes. I haven't been
> able to reproduce this error with DSC 2.0.9, and there have been no code or
> definition file changes in Opscenter.
> I can reproduce locally with the above steps. I'm happy to test any proposed
> fixes since I'm the only person able to reproduce reliably so far.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)