[
https://issues.apache.org/jira/browse/IGNITE-21588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kirill Tkalenko updated IGNITE-21588:
-------------------------------------
Reviewer: Kirill Tkalenko
> CMG commands idempotency is broken
> ----------------------------------
>
> Key: IGNITE-21588
> URL: https://issues.apache.org/jira/browse/IGNITE-21588
> Project: Ignite
> Issue Type: Bug
> Reporter: Ivan Bessonov
> Assignee: Philipp Shergalis
> Priority: Major
> Labels: ignite-3
> Fix For: 3.0.0-beta2
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> When handling commands like {{JoinReadyCommand}} and {{NodesLeaveCommand}} we
> do the following:
> * Read local state with {{{}readLogicalTopology(){}}}.
> * Modify state according to the command.
> * {*}Increase version{*}.
> * Write new state with {{{}saveSnapshotToStorage(snapshot){}}}.
> The problem lies in reading and writing of the state - it' local, and version
> value is not replicated.
> What happens when we restart the node:
> * It starts without local storage snapshot, with appliedIndex == 0, which is
> a {*}state in the past{*}.
> * We apply commands that were already applied before restart.
> * We apply these commands to locally saved topology snapshot.
> * This logical topology snapshot has a *state in the future* when compared
> to appliedIndex == 0.
> * As a result, when we re-apply some commands, we *increase the version* one
> more time, thus breaking data consistency between nodes.
> This would have been fine if we only used this version locally. But
> distribution zones rely on the consistency of the version between all nodes
> in cluster. This might break DZ data nodes handling if any of the cluster
> nodes restarts.
> How to fix:
> * Either drop the storage if there's no storage snapshot, this will restore
> consistency
> * or never start CMG group from a snapshot, but rather start it from the
> latest storage data.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)