[ https://issues.apache.org/jira/browse/IGNITE-23391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Roman Puchkovskiy updated IGNITE-23391: --------------------------------------- Description: Let's imagine the following scenario: # Cluster consists of nodes A, B; MG is on A, CMG is on B # A is turned off # User initiates MG repair via node B # A starts # 3 and 4 happen concurrently; at step 3, B does not see A in the physical topology, so A does not participate in the repair, but A's JoinRequest is processed before node B stops # As a result, A has started the Metastorage and thinks it's in the old cluster # At the same time, B restarts, gets CMG reset and MG repaired and it now also has a functioning Metastorage There is a split brain in the Metastorage, even though both leaders of the new Metastorage used the same CMG for join. We should make it impossible for node B to both allow a connection with A (allowing it to join) and not see it in the physical topology when initiating a cluster reset. > Linearize addition to physical topology wrt cluster reset initiation > -------------------------------------------------------------------- > > Key: IGNITE-23391 > URL: https://issues.apache.org/jira/browse/IGNITE-23391 > Project: Ignite > Issue Type: Improvement > Reporter: Roman Puchkovskiy > Assignee: Roman Puchkovskiy > Priority: Major > Labels: iep-128, ignite-3 > > Let's imagine the following scenario: > # Cluster consists of nodes A, B; MG is on A, CMG is on B > # A is turned off > # User initiates MG repair via node B > # A starts > # 3 and 4 happen concurrently; at step 3, B does not see A in the physical > topology, so A does not participate in the repair, but A's JoinRequest is > processed before node B stops > # As a result, A has started the Metastorage and thinks it's in the old > cluster > # At the same time, B restarts, gets CMG reset and MG repaired and it now > also has a functioning Metastorage > There is a split brain in the Metastorage, even though both leaders of the > new Metastorage used the same CMG for join. > We should make it impossible for node B to both allow a connection with A > (allowing it to join) and not see it in the physical topology when initiating > a cluster reset. -- This message was sent by Atlassian Jira (v8.20.10#820010)