[
https://issues.apache.org/jira/browse/IGNITE-21588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mirza Aliev updated IGNITE-21588:
---------------------------------
Description:
When handling commands like {{JoinReadyCommand}} and {{NodesLeaveCommand}} we
do the following:
* Read local state with {{{}readLogicalTopology(){}}}.
* Modify state according to the command.
* {*}Increase version{*}.
* Write new state with {{{}saveSnapshotToStorage(snapshot){}}}.
The problem lies in reading and writing of the state - it's local, and version
value is not replicated.
What happens when we restart the node:
* It starts with local storage snapshot, which is a {*}state in the past{*},
generally speaking.
* We apply commands that were not applied in the snapshot.
* We apply these commands to locally saved topology snapshot.
* This logical topology snapshot has a *state in the future* when compared to
storage snapshot.
* As a result, when we re-apply some commands, we *increase the version* one
more time, thus breaking data consistency between nodes.
This would have been fine if we only used this version locally. But
distribution zones rely on the consistency of the version between all nodes in
cluster. This might break DZ data nodes handling if any of the cluster nodes
restarts.
was:
When handling commands like {{JoinReadyCommand}} and {{NodesLeaveCommand}} we
do the following:
* Read local state with {{{}readLogicalTopology(){}}}.
* Modify state according to the command.
* {*}Increase version{*}.
* Write new state with {{{}saveSnapshotToStorage(snapshot){}}}.
The problem lies in reading and writing of the state - it' local, and version
value is not replicated.
What happens when we restart the node:
* It starts with local storage snapshot, which is a {*}state in the past{*},
generally speaking.
* We apply commands that were not applied in the snapshot.
* We apply these commands to locally saved topology snapshot.
* This logical topology snapshot has a *state in the future* when compared to
storage snapshot.
* As a result, when we re-apply some commands, we *increase the version* one
more time, thus breaking data consistency between nodes.
This would have been fine if we only used this version locally. But
distribution zones rely on the consistency of the version between all nodes in
cluster. This might break DZ data nodes handling if any of the cluster nodes
restarts.
> CMG commands idempotency is broken
> ----------------------------------
>
> Key: IGNITE-21588
> URL: https://issues.apache.org/jira/browse/IGNITE-21588
> Project: Ignite
> Issue Type: Bug
> Reporter: Ivan Bessonov
> Priority: Major
> Labels: ignite-3
>
> When handling commands like {{JoinReadyCommand}} and {{NodesLeaveCommand}} we
> do the following:
> * Read local state with {{{}readLogicalTopology(){}}}.
> * Modify state according to the command.
> * {*}Increase version{*}.
> * Write new state with {{{}saveSnapshotToStorage(snapshot){}}}.
> The problem lies in reading and writing of the state - it's local, and
> version value is not replicated.
> What happens when we restart the node:
> * It starts with local storage snapshot, which is a {*}state in the past{*},
> generally speaking.
> * We apply commands that were not applied in the snapshot.
> * We apply these commands to locally saved topology snapshot.
> * This logical topology snapshot has a *state in the future* when compared
> to storage snapshot.
> * As a result, when we re-apply some commands, we *increase the version* one
> more time, thus breaking data consistency between nodes.
> This would have been fine if we only used this version locally. But
> distribution zones rely on the consistency of the version between all nodes
> in cluster. This might break DZ data nodes handling if any of the cluster
> nodes restarts.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)