[ 
https://issues.apache.org/jira/browse/IGNITE-23391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Puchkovskiy updated IGNITE-23391:
---------------------------------------
    Description: 
Let's imagine the following scenario:
 # Cluster consists of nodes A, B; MG is on A, CMG is on B
 # A is turned off
 # User initiates MG repair via node B
 # A starts
 # 3 and 4 happen concurrently; at step 3, B does not see A in the physical 
topology, so A does not participate in the repair, but A's JoinRequest is 
processed before node B stops
 # As a result, A has started the Metastorage and thinks it's in the old cluster
 # At the same time, B restarts, gets CMG reset and MG repaired and it now also 
has a functioning Metastorage

There is a split brain in the Metastorage, even though both leaders of the new 
Metastorage used the same CMG for join.

We should make it impossible for node B to both allow a connection with A 
(allowing it to join) and not see it in the physical topology when initiating a 
cluster reset.

> Linearize addition to physical topology wrt cluster reset initiation
> --------------------------------------------------------------------
>
>                 Key: IGNITE-23391
>                 URL: https://issues.apache.org/jira/browse/IGNITE-23391
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Roman Puchkovskiy
>            Assignee: Roman Puchkovskiy
>            Priority: Major
>              Labels: iep-128, ignite-3
>
> Let's imagine the following scenario:
>  # Cluster consists of nodes A, B; MG is on A, CMG is on B
>  # A is turned off
>  # User initiates MG repair via node B
>  # A starts
>  # 3 and 4 happen concurrently; at step 3, B does not see A in the physical 
> topology, so A does not participate in the repair, but A's JoinRequest is 
> processed before node B stops
>  # As a result, A has started the Metastorage and thinks it's in the old 
> cluster
>  # At the same time, B restarts, gets CMG reset and MG repaired and it now 
> also has a functioning Metastorage
> There is a split brain in the Metastorage, even though both leaders of the 
> new Metastorage used the same CMG for join.
> We should make it impossible for node B to both allow a connection with A 
> (allowing it to join) and not see it in the physical topology when initiating 
> a cluster reset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to