[jira] [Commented] (IGNITE-20870) Partition replica listener skips taking snapshot lock

Roman Puchkovskiy (Jira) Fri, 15 Mar 2024 02:53:32 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-20870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17827458#comment-17827458
 ]


Roman Puchkovskiy commented on IGNITE-20870:
--------------------------------------------

It turned out that the slight inconsistency of the RAFT snapshot (namely, that 
MV data will 'be ahead of' the snapshot meta (namely, RAFT applied index) and 
TX state data) is not a problem as the inconsistency is compensated by the 
mechanisms we have.

I added the following comment (written after discussing the matter with 
[~sanpwc] ), it explains the issue:
{quote}NB: this listener makes writes to the underlying MV partition storage 
without taking the partition snapshots read lock. This causes the RAFT 
snapshots transferred to a follower being slightly inconsistent for a limited 
amount of time.
A RAFT snapshot of a partition consists of MV data, TX state data and metadata 
(which includes RAFT applied index). Here, the 'slight' inconsistency is that 
MV data might be ahead of the snapshot meta (namely, RAFT applied index) and TX 
state data.
This listener by its nature cannot advance RAFT applied index (as it works out 
of the RAFT framework). This alone makes the partition 'slightly inconsistent' 
in the same way as defined above. So, if we solve this inconsistency, we don't 
need to take the partition snapshots read lock as well.
The inconsistency does not cause any real problems because it is further 
resolved.
* If the follower with a 'slightly' inconsistent partition state becomes a 
primary replica, this requires it to apply whole available RAFT log from the 
leader before actually becoming a primary; this application will remove the 
inconsistency
* If a node with this inconsistency is going to become a primary, and it's 
already the leader, then the above will not help. But write intent resolution 
procedure will close the gap.
* 2 items above solve the inconsistency for RW transactions
* For RO reading from such a 'slightly inconsistent' partition, write intent 
resolution closes the gap as well.
{quote}

> Partition replica listener skips taking snapshot lock
> -----------------------------------------------------
>
>                 Key: IGNITE-20870
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20870
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Ivan Bessonov
>            Assignee: Roman Puchkovskiy
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.0.0-beta2
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> See `SnapshotAwarePartitionDataStorage#acquirePartitionSnapshotsReadLock`. 
> Right now the data, that's being sent by snapshot reader, might be 
> inconsistent, because reads are not synchronized with writes.
> There's a proposal to hide this lock somewhere inside data storage instance, 
> if possible. Anyway, data consistency must be fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IGNITE-20870) Partition replica listener skips taking snapshot lock

Reply via email to