[
https://issues.apache.org/jira/browse/IGNITE-27835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Egor Kuts reassigned IGNITE-27835:
----------------------------------
Assignee: Egor Kuts
> Check majority in HA mechanism based on quorum size not replica factor
> -----------------------------------------------------------------------
>
> Key: IGNITE-27835
> URL: https://issues.apache.org/jira/browse/IGNITE-27835
> Project: Ignite
> Issue Type: Bug
> Reporter: Mirza Aliev
> Assignee: Egor Kuts
> Priority: Major
> Labels: ignite-3
>
> The HA mechanism reacts to any node-left event and checks whether the
> majority is lost to decide whether it should call resetPartitions.
> *Majority check logic*
> The majority-loss condition is:
> {code:java}
> if (stableAssignmentsWithOnlyAliveNodes(partitionId, revision).size()
> < calculateQuorum(zoneDescriptor.replicas())) {
> partitionsToReset.add(partId);
> }
> {code}
> where:
> {code:java}
> private static int calculateQuorum(int replicas) {
> return replicas / 2 + 1;
> }
> {code}
> With replicas = Integer.MAX_VALUE or other huge number, quorum becomes
> enormous, so:
> {{stableAssignmentsWithOnlyAliveNodes(...).size()}} is always less than quorum
> meaning any node-left event triggers resetPartitions, even when the cluster
> is still healthy and a real majority exists.
> *Why this breaks things
> *
> At some step resetPartitions requires to reset the Raft group to a single
> most up-to-date node by applying a single-node configuration to that node.
> But in our case, the remaining nodes are still alive and still form a
> majority under the old configuration. As a result, we effectively create two
> leaders:
> * one leader in the “new” single-node Raft group
> * another leader elected by the remaining nodes under the old config
> That leads to RAFT corruption that we see in logs
> *Proposed fix*
> The HA majority check should not be derived from replicas. Instead, it should
> be based on the actual quorum size used for HA decisions. We need to retrieve
> voting members from stable (isPeer=true), and compare the number of such node
> with quorum size.
> However, this needs some design work, because quorum can still be configured
> to a large number while only a few nodes are actually started — and we need
> to define how HA logic should behave in that situation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)