[ 
https://issues.apache.org/jira/browse/IGNITE-27835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mirza Aliev updated IGNITE-27835:
---------------------------------
    Description: 
The HA mechanism reacts to any node-left event and checks whether the majority 
is lost to decide whether it should call resetPartitions.

*Majority check logic*

The majority-loss condition is:

{code:java}
if (stableAssignmentsWithOnlyAliveNodes(partitionId, revision).size()
        < calculateQuorum(zoneDescriptor.replicas())) {
    partitionsToReset.add(partId);
}
{code}


where:

{code:java}
private static int calculateQuorum(int replicas) {
    return replicas / 2 + 1;
}
{code}

With replicas = Integer.MAX_VALUE or other huge number, quorum becomes 
enormous, so:

{{stableAssignmentsWithOnlyAliveNodes(...).size()}} is always less than quorum

meaning any node-left event triggers resetPartitions, even when the cluster is 
still healthy and a real majority exists.

*Why this breaks things
*
At some step resetPartitions requires to reset the Raft group to a single most 
up-to-date node by applying a single-node configuration to that node.

But in our case, the remaining nodes are still alive and still form a majority 
under the old configuration. As a result, we effectively create two leaders:

* one leader in the “new” single-node Raft group

* another leader elected by the remaining nodes under the old config

That leads to RAFT corruption that we see in logs

*Proposed fix*

The HA majority check should not be derived from replicas. Instead, it should 
be based on the actual quorum size used for HA decisions. 

However, this needs some design work, because quorum can still be configured to 
a large number while only a few nodes are actually started — and we need to 
define how HA logic should behave in that situation.

> Check majority in HA mechanism based on quorum size not replica factor 
> -----------------------------------------------------------------------
>
>                 Key: IGNITE-27835
>                 URL: https://issues.apache.org/jira/browse/IGNITE-27835
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Mirza Aliev
>            Priority: Major
>              Labels: ignite-3
>
> The HA mechanism reacts to any node-left event and checks whether the 
> majority is lost to decide whether it should call resetPartitions.
> *Majority check logic*
> The majority-loss condition is:
> {code:java}
> if (stableAssignmentsWithOnlyAliveNodes(partitionId, revision).size()
>         < calculateQuorum(zoneDescriptor.replicas())) {
>     partitionsToReset.add(partId);
> }
> {code}
> where:
> {code:java}
> private static int calculateQuorum(int replicas) {
>     return replicas / 2 + 1;
> }
> {code}
> With replicas = Integer.MAX_VALUE or other huge number, quorum becomes 
> enormous, so:
> {{stableAssignmentsWithOnlyAliveNodes(...).size()}} is always less than quorum
> meaning any node-left event triggers resetPartitions, even when the cluster 
> is still healthy and a real majority exists.
> *Why this breaks things
> *
> At some step resetPartitions requires to reset the Raft group to a single 
> most up-to-date node by applying a single-node configuration to that node.
> But in our case, the remaining nodes are still alive and still form a 
> majority under the old configuration. As a result, we effectively create two 
> leaders:
> * one leader in the “new” single-node Raft group
> * another leader elected by the remaining nodes under the old config
> That leads to RAFT corruption that we see in logs
> *Proposed fix*
> The HA majority check should not be derived from replicas. Instead, it should 
> be based on the actual quorum size used for HA decisions. 
> However, this needs some design work, because quorum can still be configured 
> to a large number while only a few nodes are actually started — and we need 
> to define how HA logic should behave in that situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to