[
https://issues.apache.org/jira/browse/IGNITE-20870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17827458#comment-17827458
]
Roman Puchkovskiy commented on IGNITE-20870:
--------------------------------------------
It turned out that the slight inconsistency of the RAFT snapshot (namely, that
MV data will 'be ahead of' the snapshot meta (namely, RAFT applied index) and
TX state data) is not a problem as the inconsistency is compensated by the
mechanisms we have.
I added the following comment (written after discussing the matter with
[~sanpwc] ), it explains the issue:
{quote}NB: this listener makes writes to the underlying MV partition storage
without taking the partition snapshots read lock. This causes the RAFT
snapshots transferred to a follower being slightly inconsistent for a limited
amount of time.
A RAFT snapshot of a partition consists of MV data, TX state data and metadata
(which includes RAFT applied index). Here, the 'slight' inconsistency is that
MV data might be ahead of the snapshot meta (namely, RAFT applied index) and TX
state data.
This listener by its nature cannot advance RAFT applied index (as it works out
of the RAFT framework). This alone makes the partition 'slightly inconsistent'
in the same way as defined above. So, if we solve this inconsistency, we don't
need to take the partition snapshots read lock as well.
The inconsistency does not cause any real problems because it is further
resolved.
* If the follower with a 'slightly' inconsistent partition state becomes a
primary replica, this requires it to apply whole available RAFT log from the
leader before actually becoming a primary; this application will remove the
inconsistency
* If a node with this inconsistency is going to become a primary, and it's
already the leader, then the above will not help. But write intent resolution
procedure will close the gap.
* 2 items above solve the inconsistency for RW transactions
* For RO reading from such a 'slightly inconsistent' partition, write intent
resolution closes the gap as well.
{quote}
> Partition replica listener skips taking snapshot lock
> -----------------------------------------------------
>
> Key: IGNITE-20870
> URL: https://issues.apache.org/jira/browse/IGNITE-20870
> Project: Ignite
> Issue Type: Bug
> Reporter: Ivan Bessonov
> Assignee: Roman Puchkovskiy
> Priority: Major
> Labels: ignite-3
> Fix For: 3.0.0-beta2
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> See `SnapshotAwarePartitionDataStorage#acquirePartitionSnapshotsReadLock`.
> Right now the data, that's being sent by snapshot reader, might be
> inconsistent, because reads are not synchronized with writes.
> There's a proposal to hide this lock somewhere inside data storage instance,
> if possible. Anyway, data consistency must be fixed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)