[jira] [Assigned] (IGNITE-17263) Implement leader to replica safe time propagation

Denis Chudov (Jira) Tue, 13 Sep 2022 07:38:45 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-17263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Denis Chudov reassigned IGNITE-17263:
-------------------------------------

    Assignee: Denis Chudov

> Implement leader to replica safe time propagation
> -------------------------------------------------
>
>                 Key: IGNITE-17263
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17263
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexander Lapin
>            Assignee: Denis Chudov
>            Priority: Major
>              Labels: ignite-3, transaction3_ro
>         Attachments: Screenshot from 2022-07-06 16-48-30.png, Screenshot from 
> 2022-07-06 16-48-41.png
>
>
> In order to perform replica reads, it's required either to use read index or 
> check the safe time. Let's recall corresponding section from tx design 
> document.
> RO transactions can be executed on non-primary replicas. write intent 
> resolution doesn’t help because a write intent for a committed transaction 
> may not be yet replicated to the replica. To mitigate this issue, it’s enough 
> to run readIndex on each mapped partition leader, fetch the commit index and 
> wait on a replica until it’s applied. This will guarantee that all required 
> write intents are replicated and present locally. After that the normal write 
> intern resolution should do the job.
> There is a second option, which doesn’t require the network RTT. We can use a 
> special low watermark timestamp (safeTs) per replication group, which 
> corresponds to the apply index of a replicated entry, so then an apply index 
> is advanced during the replication, then the safeTs is monotonically 
> incremented too. The HLC used for safeTs advancing is assigned to a 
> replicated entry in an ordered way.
> Special measures are needed to periodically advance the safeTs if no updates 
> are happening. It’s enough to use a special replication command for this 
> purpose.
> All we need during RO txn is to wait until a safeTs advances past the RO txn 
> readTs. 
>  !Screenshot from 2022-07-06 16-48-30.png! 
> In the picture we have two concurrent transactions mapped to the same 
> partition: T1 and T2.
> OpReq(w1(x)) and OpReq(w2(x)) are received concurrently. Each write intent is 
> assigned a timestamp in a monotonic order consistent with the replication 
> order. This can be for example done when replication entries are dequeued for 
> processing by replication protocol (we assume entries are replicated 
> successively.
> It’s not enough only to wait for safeTs - it may never happen due to absence 
> of activity in the partition. Consider the next diagram:
>  !Screenshot from 2022-07-06 16-48-41.png! 
> We need an additional safeTsSync command to propagate a safeTs event in case 
> there are no updates in the partition.
> Actually, it seems that it's possible to reuse common raft messages such as 
> heartbeatRequests, vote/prevoteRequests together with appendEntriesRequests 
> in order to propagate safeTime from leader to replicas. As was mentioned in 
> [IGNITE-17261|https://issues.apache.org/jira/browse/IGNITE-17261] txnState 
> switch should be linearized with all safe-time propagation requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IGNITE-17263) Implement leader to replica safe time propagation

Reply via email to