[ 
https://issues.apache.org/jira/browse/IGNITE-27835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Egor Kuts reassigned IGNITE-27835:
----------------------------------

    Assignee: Egor Kuts

> Check majority in HA mechanism based on quorum size not replica factor 
> -----------------------------------------------------------------------
>
>                 Key: IGNITE-27835
>                 URL: https://issues.apache.org/jira/browse/IGNITE-27835
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Mirza Aliev
>            Assignee: Egor Kuts
>            Priority: Major
>              Labels: ignite-3
>
> The HA mechanism reacts to any node-left event and checks whether the 
> majority is lost to decide whether it should call resetPartitions.
> *Majority check logic*
> The majority-loss condition is:
> {code:java}
> if (stableAssignmentsWithOnlyAliveNodes(partitionId, revision).size()
>         < calculateQuorum(zoneDescriptor.replicas())) {
>     partitionsToReset.add(partId);
> }
> {code}
> where:
> {code:java}
> private static int calculateQuorum(int replicas) {
>     return replicas / 2 + 1;
> }
> {code}
> With replicas = Integer.MAX_VALUE or other huge number, quorum becomes 
> enormous, so:
> {{stableAssignmentsWithOnlyAliveNodes(...).size()}} is always less than quorum
> meaning any node-left event triggers resetPartitions, even when the cluster 
> is still healthy and a real majority exists.
> *Why this breaks things
> *
> At some step resetPartitions requires to reset the Raft group to a single 
> most up-to-date node by applying a single-node configuration to that node.
> But in our case, the remaining nodes are still alive and still form a 
> majority under the old configuration. As a result, we effectively create two 
> leaders:
> * one leader in the “new” single-node Raft group
> * another leader elected by the remaining nodes under the old config
> That leads to RAFT corruption that we see in logs
> *Proposed fix*
> The HA majority check should not be derived from replicas. Instead, it should 
> be based on the actual quorum size used for HA decisions. We need to retrieve 
> voting members from stable (isPeer=true), and compare the number of such node 
> with quorum size.
> However, this needs some design work, because quorum can still be configured 
> to a large number while only a few nodes are actually started — and we need 
> to define how HA logic should behave in that situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to