[
https://issues.apache.org/jira/browse/IGNITE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Goncharuk reassigned IGNITE-11394:
-----------------------------------------
Assignee: Alexey Goncharuk
> Infinite No next node in topology messages during node restart scenario
> -----------------------------------------------------------------------
>
> Key: IGNITE-11394
> URL: https://issues.apache.org/jira/browse/IGNITE-11394
> Project: Ignite
> Issue Type: Improvement
> Reporter: Alexey Goncharuk
> Assignee: Alexey Goncharuk
> Priority: Major
>
> I observe a situation with the following symptoms during a cycled nodes
> restart:
> - A node being joining to the cluster sends join request, receives
> NodeAddedMessage and awaits NodeAddFinishedMessage
> - The node receives a metrics update message, the message is in the queue
> - The whole cluster is being restarted, a new ring is formed
> - The node re-sends the join request, it is successfully process by the ring
> - The node added message is received by the joining node
> - The node detects that it cannot send messages (failed nodes contains all
> ring remote nodes)
> - Sine there was already a metrics update message in the queue, the node
> attempts to re-add the message to the queue. Since the metrics update message
> is a high priority message, it is added to the head of the queue and the node
> gets stuck in an infinite loop
> I suggest to drop metrics update message in {{sendMessageAcrossRing}} if we
> see the {{No next node in topology}} situation.
> Another question is why don't we pass the collection of failed nodes to the
> {{ring.hasRemoteNodes()}} method.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)