Mirza Aliev created IGNITE-23559:
------------------------------------
Summary: resetPartitions improvements: two phase reset
Key: IGNITE-23559
URL: https://issues.apache.org/jira/browse/IGNITE-23559
Project: Ignite
Issue Type: Improvement
Reporter: Mirza Aliev
resetPartitions improvements: two phase reset. Reset to the single most
up-to-date node and schedule new rebalance with targetTopology that is the
alive nodes from the current stable assignments
h3. Motivation
According to
[IEP-131|https://cwiki.apache.org/confluence/display/IGNITE/IEP-131%3A+Partition+Majority+Unavailability+Handling]
{{DisasterRecoveryManager#resetPartitions}} will be the core method for
recovering partition majority availability. If only nodes [A, B, C] are alive
from the partition assignments, {{NodeImpl#resetPeers}} mechanism will be
called with the target topology [A, B, C].
This approach is wrong, because if {{NodeImpl#resetPeers}} with [A, B, C] will
be called on nodes A, B, C and {{NodeImpl#resetPeers}} with [A] will be called
on node A, it could lead to the situation when we will have two leaders, one
from B, C because the form majority, and leader A from the configuration [A].
To resolve this problem, {{DisasterRecoveryManager#resetPartitions}} must work
in two phases: the first phase must call {{NodeImpl#resetPeers}} only with a
configuration formed from the single node (with the most up-to-date log, in our
example it could be A), and after that rebalance on [A, B, C] must be scheduled.
h3. Definition of Done
* resetPartitions must fork in two phases. Reset to the single most up-to-date
node and schedule new rebalance with targetTopology that is the alive nodes
from the current stable assignments
--
This message was sent by Atlassian Jira
(v8.20.10#820010)