Roman Puchkovskiy created IGNITE-18712:
------------------------------------------

             Summary: Do not allow a node excluded from Physical Topology to 
enter the topology again
                 Key: IGNITE-18712
                 URL: https://issues.apache.org/jira/browse/IGNITE-18712
             Project: Ignite
          Issue Type: New Feature
            Reporter: Roman Puchkovskiy
            Assignee: Roman Puchkovskiy
             Fix For: 3.0.0-beta2


The following scenario is possible:
 # Node X is a part of PT
 # Its network cable gets unplugged, but the node X keeps being alive
 # After proper timeouts, other nodes remove the node X from PT, so their 
{{MessagingServices}} drop messages still not delivered to node X
 # The network cable gets plugged again, so the node X attempts to enter the PT 
with the same old ID (aka Launch ID)

If we allow it to enter PT again, we might lose some messages to node X from 
other nodes, but node X will never know about it. Some state in its memory 
might still remain from a process thinking that the messages will be delivered 
later, so some invariants might break.

To prevent such a situation, the node must be refused entry, namely, a 
connection must be terminated on a handshake attempt. This has to be done both 
in {{RecoveryServerHandshakeManager}} and 
{{{}RecoveryClientHandshakeManager{}}}.

When a node is refused a connection attempt, the refusing node must first send 
an explaining message (like 'your ID is stale') and then close the physical 
connection.

The refused node must take measures to refresh its identity (like initiating a 
critical failure using a Failure Handler).

A subtle thing is how we persist the fact that some node ID is stale. For 
starters, we could make this information volatile (only keep it in memory), but 
later we could record this information using CMG.

Please do not confuse this issue with IGNITE-18685 which was caused by a 
rejected attempt of fixing same problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to