[
https://issues.apache.org/jira/browse/CASSANDRA-13700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094748#comment-16094748
]
Jason Brown commented on CASSANDRA-13700:
-----------------------------------------
We also might need to make {{HeartBeatState.version}} volatile, but I'm still
thinking about it (just adding it here for discussion)
> Heartbeats can cause gossip information to go permanently missing on certain
> nodes
> ----------------------------------------------------------------------------------
>
> Key: CASSANDRA-13700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13700
> Project: Cassandra
> Issue Type: Bug
> Components: Distributed Metadata
> Reporter: Joel Knighton
> Assignee: Joel Knighton
> Priority: Critical
>
> In {{Gossiper.getStateForVersionBiggerThan}}, we add the {{HeartBeatState}}
> from the corresponding {{EndpointState}} to the {{EndpointState}} to send.
> When we're getting state for ourselves, this means that we add a reference to
> the local {{HeartBeatState}}. Then, once we've built a message (in either the
> Syn or Ack handler), we send it through the {{MessagingService}}. In the case
> that the {{MessagingService}} is sufficiently slow, the {{GossipTask}} may
> run before serialization of the Syn or Ack. This means that when the
> {{GossipTask}} acquires the gossip {{taskLock}}, it may increment the
> {{HeartBeatState}} version of the local node as stored in the endpoint state
> map. Then, when we finally serialize the Syn or Ack, we'll follow the
> reference to the {{HeartBeatState}} and serialize it with a higher version
> than we saw when constructing the Ack or Ack2.
> Consider the case where we see {{HeartBeatState}} with version 4 when
> constructing an Ack and send it through the {{MessagingService}}. Then, we
> add some piece of state with version 5 to our local {{EndpointState}}. If
> {{GossipTask}} runs and increases the {{HeartBeatState}} version to 6 before
> the {{MessageOut}} containing the Ack is serialized, the node receiving the
> Ack will believe it is current to version 6, despite the fact that it has
> never received a message containing the {{ApplicationState}} tagged with
> version 5.
> I've reproduced in this in several versions; so far, I believe this is
> possible in all versions.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]