[
https://issues.apache.org/jira/browse/IGNITE-24818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vladimir Pligin reassigned IGNITE-24818:
----------------------------------------
Assignee: Kirill Sizov
> Possible rejection of full transactions
> ---------------------------------------
>
> Key: IGNITE-24818
> URL: https://issues.apache.org/jira/browse/IGNITE-24818
> Project: Ignite
> Issue Type: Bug
> Reporter: Denis Chudov
> Assignee: Kirill Sizov
> Priority: Major
> Labels: ignite-3
>
> Full transaction require passing lease start time in the command, the FSM
> compares it to the current lease start time that is saved in the storage in
> order to linearize the full transactions and primary replica changes. In
> in-memory groups, this lease start time may be lost.
> Consider the case:
> * primary replica is elected, written to meta storage, its start time is
> written into in-memory partition storage;
> * in-memory storage loses the data due to some nodes' restarts (including
> the cases where majority is not lost and the group still is operable, like it
> is described in IGNITE-24772).
> * The new leader may be elected on another node, it doesn't matter because
> the primary replica is not always colocated with leader;
> * full transaction tries to commit data but is rejected (or error is
> occurred) due to the loss of lease start time in the partition storage. Full
> transactions will be inoperable for this group until the new primary is
> elected.
> Also, the problem is relevant for the HA and disaster recovery scenarios.
> Probably, we will need to stop the lease prolongation after performing
> resetPartition in any zone (either in-memory or persistent).
> *UPD:*
> After refinement of this task we have agreed that for HA and manual reset
> track there is no such problem. The only thing that we need to check is to
> find or create a test which checks that primary replica will be changed if
> previous primary replica is out of the assignments. WIth this functionality
> we can be sure that in HA case, if primary replica was out of assignments or
> was stopped, new startTime will be propagated to partition if previous
> startTime was lost during majority loss.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)