[
https://issues.apache.org/jira/browse/CASSANDRA-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stefania updated CASSANDRA-7816:
--------------------------------
Attachment: cassandra_7816.txt
Submitting a patch for 2.0.
The duplicate DOWN notification is caused by
{{Gossiper.handleMajorStateChange}} passing the remote endpoint state to
{{StorageService.onRestart}}, which then incorrectly comes to the conclusion
that the node was not previously marked down. I changed it to receive the local
state, if not null. If it is null we do not call {{onRestart}}, please confirm
that this does not introduce problems (I checked all {{onStart}}
implementations and it looks OK to me).
The multiple UP notifications are caused by the call to {{markAlive()}} in
{{Gossiper.applyStateLocally()}} when receiving multiple gossip messages.
Because {{markAlive()}} only marks the node as alive after receiving an echo
message (CASSANDRA-3533), there is a delay during which the node is still not
marked as alive. If gossip messages are received during this period, we
incorrectly call {{markAlive()}} multiple times in {{applyStateLocally()}}. I
fixed it by adding a flag to {{EndpointState}} and by checking this flag in
{{markAlive}}, if an echo is outstanding then we do not send another one until
we've received an answer. When there is a major change, {{markAlive()}} is
called on the remote state, for which this flag is not set and so we try againg
sending an echo message in mark alive even if we did not receive a reply to a
previous echo request.
> Duplicate DOWN/UP Events Pushed with Native Protocol
> ----------------------------------------------------
>
> Key: CASSANDRA-7816
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7816
> Project: Cassandra
> Issue Type: Bug
> Components: API
> Reporter: Michael Penick
> Assignee: Stefania
> Priority: Minor
> Fix For: 2.0.13, 2.1.4
>
> Attachments: cassandra_7816.txt, tcpdump_repeating_status_change.txt,
> trunk-7816.txt
>
>
> Added "MOVED_NODE" as a possible type of topology change and also specified
> that it is possible to receive the same event multiple times.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)