Duncan Sands created CASSANDRA-5769:
---------------------------------------
Summary: Not all STATUS_CHANGE UP events reported via the native
protocol
Key: CASSANDRA-5769
URL: https://issues.apache.org/jira/browse/CASSANDRA-5769
Project: Cassandra
Issue Type: Bug
Components: Core
Affects Versions: 1.2.6, 1.2.5
Environment: Uubuntu 12.04, x86, 64 bit
Reporter: Duncan Sands
Priority: Minor
Not all gossip UP events are pushed to native protocol users who have
registered for them. This seems to be a native protocol issue because nodes
themselves get the UP event (as seen in their logs). I can consistently
reproduce this issue as follows:
1) connect a client to a cluster node ("node1") using the native protocol,
register for TOPOLOGY_CHANGE and STATUS_CHANGE events. (Probably you only need
to register for STATUS_CHANGE to see this, however my client registers for
both).
2) on another node ("node2"), send SIGSTOP to the Cassandra process.
3) after about 30 seconds the client gets pushed a STATUS_CHANGE DOWN event for
the stopped node.
4) on node2, send SIGCONT to the the Cassandra process.
5) wait forever to get a STATUS_CHANGE UP event. This is failure: no event is
ever received.
Observe that node1 does know that node2 is back up: in its system log I see for
example
INFO [GossipStage:1] 2013-07-17 14:27:41,238 Gossiper.java (line 771)
InetAddress /172.18.34.169 is now UP
shortly after sending SIGCONT to the stopped process.
To eliminate the possibility that my client is at fault, I performed the
following sanity check:
2') on node2, stopped Cassandra nicely using: sudo service cassandra stop
4') on node2, restarted Cassandra using: sudo service cassandra start
In this case the client soon after gets a STATUS_CHANGE DOWN event followed by
a STATUS_CHANGE UP event for node2.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira