[ 
https://issues.apache.org/jira/browse/IGNITE-20870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818130#comment-17818130
 ] 

Roman Puchkovskiy edited comment on IGNITE-20870 at 2/17/24 7:54 AM:
---------------------------------------------------------------------

For our Raft snapshot mechanics to work correctly, the following invariants 
must be maintained:
 # All updates to storages must be made via SnapshotAwarePartitionDataStorage
 # ... while holding PartitionSnapshotsReadLock
 # After executing a 'command', lastUpdateIndex+Term must be updated according 
with the 'command'
 # ... this index+term update must be made under the same 
PartitionSnapshotsReadLock mentioned in the item 2

That's why in PartitionListener we take PartitionSnapshotsReadLock explicitly 
and not inside storage methods (also because execution of one command actually 
does a lot more now than just calling one storage method: it might also do 
storage cleanup [it's about write intent cleanup, not something like GC]). It 
doesn't seem possible to move acquisition of the lock inside storage methods.

There is a problem that is a lot worse: PartitionReplicaListener operates out 
of Raft, and it does not know any index when executing its 'commands'. Right 
now, I don't have any ideas how to deal with this as Raft snapshots mechanics 
clearly require Raft index to make snapshot cuts. Maybe try to use HLC?


was (Author: rpuch):
For our Raft snapshot mechanics to work correctly, the following invariants 
must be maintained:
 # All updates to storages must be made via SnapshotAwarePartitionDataStorage
 # ... while holding PartitionSnapshotsReadLock
 # After executing a 'command', lastUpdateIndex+Term must be updated according 
with the 'command'
 # ... this index+term update must be made under the same 
PartitionSnapshotsReadLock mentioned in the item 2

That's why in PartitionListener we take PartitionSnapshotsReadLock explicitly 
and not inside storage methods (also because execution of one command actually 
does a lot more now than just calling one storage method: it might also do 
storage cleanup [it's about write intent cleanup, not something like GC]). It 
doesn't seem possible to move acquisition of the lock inside storage methods.

There is a problem that is a lot worse: PartitionReplicaListener operates out 
of Raft, and it does not know any index when executing its 'commands'. Right 
now, I don't have any ideas how to deal with this as Raft snapshots mechanics 
clearly require Raft index to make snapshot cuts.

> Partition replica listener skips taking snapshot lock
> -----------------------------------------------------
>
>                 Key: IGNITE-20870
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20870
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Ivan Bessonov
>            Priority: Major
>              Labels: ignite-3
>
> See `SnapshotAwarePartitionDataStorage#acquirePartitionSnapshotsReadLock`. 
> Right now the data, that's being sent by snapshot reader, might be 
> inconsistent, because reads are not synchronized with writes.
> There's a proposal to hide this lock somewhere inside data storage instance, 
> if possible. Anyway, data consistency must be fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to