[ 
https://issues.apache.org/jira/browse/IGNITE-20124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876740#comment-17876740
 ] 

Roman Puchkovskiy commented on IGNITE-20124:
--------------------------------------------

The patch looks good to me

> Prevent double storage updates within primary
> ---------------------------------------------
>
>                 Key: IGNITE-20124
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20124
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexander Lapin
>            Assignee: Alexander Lapin
>            Priority: Major
>              Labels: ignite-3, transactions
>          Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> h3. Motivation
> In order to preserve the guarantee that the primary replica is always 
> up-to-date it's required to:
>  * In case of common RW transaction - insert writeIntent to the storage 
> within primary before replication.
>  * In case of one-phase-commit - insert commitedWrite after the replication.
> Both have already been done. However, that means that if primary is part of 
> the replication group, and it's true in almost all cases, we will double the 
> update:
>  * In case of common RW transaction - through the replication.
>  * In case of one-phase-commit - either through the replication, or though 
> post update, if replication was fast enough.
> h3. Definition of Done
>  * Prevent double storage updates within primary.
> h3. Implementation Notes
> The easiest way to prevent double insert is to skip one if local safe time is 
> greater or equal to candidates. There are 3 places where we update partition 
> storage:
>  # Primary pre-replication update. In that case, the second update on 
> replication should be excluded.
>  # Primary post-replication update in case of 1PC. It's possible to see 
> already updated data if replication was already processed locally. It is 
> expected to be already covered in 
> https://issues.apache.org/jira/browse/IGNITE-15927 . We should check the 
> primary safe time on post-replication update and don't do update if the safe 
> time is already adjusted.
>  # Insert through replication. In case of !1PC on every primary there will be 
> double insert (see 1). In case of 1PC it depends, so we should check the safe 
> time on primary to know whether the update should be done (see 2).
> In every case, the storage indexes still should be adjusted on replication, 
> as it is done now, because the progress of indexes on FSM write operations 
> should not be violated - otherwise, a Raft snapshot-based rebalance would be 
> broken. We may have two non-consistent storage updates on primary which may 
> affect different fsyncs, but the transactional correctness isn't violated by 
> these non-consistent storage updates, because there is only a possibility 
> that some writes or write intents will go ahead of indexes and therefore will 
> be included into snapshots - however we still can process such writes and 
> resolve write intents.
> Also, the safe time needs to be updated on the primary replica now. There can 
> be following scenarios:
>  # Two-phase commit: we can advance safe time on primary, make 
> pre-replication update and then run Raft command. Both safe time adjustment 
> and storage update happen before replication.
>  # One-phase commit: safe time should be advanced after completeness of Raft 
> command future. There is no happens-before between the future callback and 
> the replication handler, so the safe time should be checked and advanced in 
> both places. We should use some critical section, preventing race between 
> safe time check, safe time adjustment and storage update. For this purpose we 
> can use row locks that are taken inside of {{{}storage#runConsistently{}}}. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to