[ 
https://issues.apache.org/jira/browse/CASSANDRA-10969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096477#comment-15096477
 ] 

T. David Hudson commented on CASSANDRA-10969:
---------------------------------------------

If N2 after its first restart hadn't briefly reported N1 up in its nodetool 
status, it would've been clearer that the behavior was simply that of getting 
an old generation from other old-generation nodes.  I thought, however, that 
gossip had to succeed with a peer before nodetool status would report a node 
up.  So, N2 would've had to gossip first with N1, in order for that interaction 
to succeed.  If so, gossiping with N3 or N4 after that, or perhaps some other 
event, must've put it back in its bad in-memory state.  If so, gossip might 
have trouble staying current where the failure scenario being addressed by this 
bug hasn't occurred.


> long-running cluster sees bad gossip generation when a node restarts
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-10969
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10969
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination
>         Environment: 4-node Cassandra 2.1.1 cluster, each node running on a 
> Linux 2.6.32-431.20.3.dl6.x86_64 VM
>            Reporter: T. David Hudson
>            Assignee: Joel Knighton
>            Priority: Minor
>
> One of the nodes in a long-running Cassandra 2.1.1 cluster (not under my 
> control) restarted.  The remaining nodes are logging errors like this:
>     "received an invalid gossip generation for peer xxx.xxx.xxx.xxx; local 
> generation = 1414613355, received generation = 1450978722"
> The gap between the local and received generation numbers exceeds the 
> one-year threshold added for CASSANDRA-8113.  The system clocks are 
> up-to-date for all nodes.
> If this is a bug, the latest released Gossiper.java code in 2.1.x, 2.2.x, and 
> 3.0.x seems not to have changed the behavior that I'm seeing.
> I presume that restarting the remaining nodes will clear up the problem, 
> whence the minor priority.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to