[
https://issues.apache.org/jira/browse/IGNITE-23735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mirza Aliev updated IGNITE-23735:
---------------------------------
Description:
h3. Motivation
In https://issues.apache.org/jira/browse/IGNITE-22904 was implemented logic,
which prevents to leader hijack. More details could be found in the ticket
description, briefly, when node come back after majority reset, it might still
think it's a member of the voting set (judging by its local partition Raft
log), so it might propose itself as a candidate, and it can win the election if
there are enough such nodes. This will result in the leadership being hijacked
by the 'old' majority, which will mess the repaired partition majority up.
Let's consider the example:
# Replication factor is set to 3, assignments = peers = ABC; the index with
configuration ABC is 10.
# ABC nodes fail.
# The group is repaired, assignments = peers = CDE, with C as the leader.
# AB nodes recover.
# The replication factor changes to 5, assignments = ABCDE; the index with
configuration ABCDE is 20.
# C replicates the log to AB up to index 10 and then fails.
# AB assume the configuration is ABC and start an election, electing A as
leader before receiving vote requests from D or E.
As a result, A is elected leader, even though it is not the most up-to-date
node in terms of the log (since CDE has a more advanced log). This violates
Raft's invariants.
In this ticket we must reuse logic with setting fake conf when node receive
data from new raft group (see {{NodeImpl#refreshLeadershipAbstaining}}) and
logic of externally enforced config index (see
{{RaftGroupOptions#externallyEnforcedConfigIndex()}}), so nodes that restart
and receive raft log won't try to elect leader untill all data is replicated.
h3. Definition of done
* Nodes that join after partition majority reset must not elect a leader from
the old majority that could hijack leadership and cause havoc in the repaired
group.
was:
h3. Motivation
In https://issues.apache.org/jira/browse/IGNITE-22904 was implemented logic,
which prevents to leader hijack. More details could be found in the ticket
description, briefly, when node come back after majority reset, it might still
think it's a member of the voting set (judging by its local partition Raft
log), so it might propose itself as a candidate, and it can win the election if
there are enough such nodes. This will result in the leadership being hijacked
by the 'old' majority, which will mess the repaired partition majority up.
Let's consider the example:
# Replication factor is set to 3, assignments = peers = ABC; the index with
configuration ABC is 10.
# ABC nodes fail.
# The group is repaired, assignments = peers = CDE, with C as the leader.
# AB nodes recover.
# The replication factor changes to 5, assignments = ABCDE; the index with
configuration ABCDE is 20.
# C replicates the log to AB up to index 10 and then fails.
# AB assume the configuration is ABC and start an election, electing A as
leader before receiving vote requests from D or E.
As a result, A is elected leader, even though it is not the most up-to-date
node in terms of the log (since CDE has a more advanced log). This violates
Raft's invariants.
In this ticket we must reuse logic with setting fake conf when node receive
data from new raft group (see {{NodeImpl#refreshLeadershipAbstaining}}) and
logic of externally enforced config index (see
{{RaftGroupOptions#externallyEnforcedConfigIndex()}})
h3. Definition of done
* Nodes that join after partition majority reset must not elect a leader from
the old majority that could hijack leadership and cause havoc in the repaired
group.
> resetPartitions improvements: leader hijack protection must be implemented
> --------------------------------------------------------------------------
>
> Key: IGNITE-23735
> URL: https://issues.apache.org/jira/browse/IGNITE-23735
> Project: Ignite
> Issue Type: Improvement
> Reporter: Mirza Aliev
> Priority: Major
> Labels: ignite-3
>
> h3. Motivation
> In https://issues.apache.org/jira/browse/IGNITE-22904 was implemented logic,
> which prevents to leader hijack. More details could be found in the ticket
> description, briefly, when node come back after majority reset, it might
> still think it's a member of the voting set (judging by its local partition
> Raft log), so it might propose itself as a candidate, and it can win the
> election if there are enough such nodes. This will result in the leadership
> being hijacked by the 'old' majority, which will mess the repaired partition
> majority up.
> Let's consider the example:
> # Replication factor is set to 3, assignments = peers = ABC; the index with
> configuration ABC is 10.
> # ABC nodes fail.
> # The group is repaired, assignments = peers = CDE, with C as the leader.
> # AB nodes recover.
> # The replication factor changes to 5, assignments = ABCDE; the index with
> configuration ABCDE is 20.
> # C replicates the log to AB up to index 10 and then fails.
> # AB assume the configuration is ABC and start an election, electing A as
> leader before receiving vote requests from D or E.
> As a result, A is elected leader, even though it is not the most up-to-date
> node in terms of the log (since CDE has a more advanced log). This violates
> Raft's invariants.
> In this ticket we must reuse logic with setting fake conf when node receive
> data from new raft group (see {{NodeImpl#refreshLeadershipAbstaining}}) and
> logic of externally enforced config index (see
> {{RaftGroupOptions#externallyEnforcedConfigIndex()}}), so nodes that restart
> and receive raft log won't try to elect leader untill all data is replicated.
> h3. Definition of done
> * Nodes that join after partition majority reset must not elect a leader from
> the old majority that could hijack leadership and cause havoc in the repaired
> group.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)