[
https://issues.apache.org/jira/browse/IGNITE-23559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexander Lapin updated IGNITE-23559:
-------------------------------------
Epic Link: IGNITE-23694 (was: IGNITE-23438)
> resetPartitions improvements: two phase reset
> ---------------------------------------------
>
> Key: IGNITE-23559
> URL: https://issues.apache.org/jira/browse/IGNITE-23559
> Project: Ignite
> Issue Type: Improvement
> Reporter: Mirza Aliev
> Assignee: Kirill Sizov
> Priority: Major
> Labels: ignite-3
> Time Spent: 10m
> Remaining Estimate: 0h
>
> h3. Motivation
> According to
> [IEP-131|https://cwiki.apache.org/confluence/display/IGNITE/IEP-131%3A+Partition+Majority+Unavailability+Handling]
> {{DisasterRecoveryManager#resetPartitions}} will be the core method for
> recovering partition majority availability. If only nodes [A, B, C] are alive
> from the partition assignments, {{NodeImpl#resetPeers}} mechanism will be
> called with the target topology [A, B, C].
> This approach is wrong, because if {{NodeImpl#resetPeers}} with [A, B, C]
> will be called on nodes A, B, C and {{NodeImpl#resetPeers}} with [A] will be
> called on node A, it could lead to the situation when we will have two
> leaders, one from B, C because the form majority, and leader A from the
> configuration [A].
> To resolve this problem, {{DisasterRecoveryManager#resetPartitions}} must
> work in two phases: the first phase must call {{NodeImpl#resetPeers}} only
> with a configuration formed from the single node (with the most up-to-date
> log, in our example it could be A), and after that rebalance on [A, B, C]
> must be scheduled.
> h3. Definition of Done
> * resetPartitions must fork in two phases. Reset to the single most
> up-to-date node and schedule new rebalance with targetTopology that is the
> alive nodes from the current stable assignments
--
This message was sent by Atlassian Jira
(v8.20.10#820010)