[
https://issues.apache.org/jira/browse/IGNITE-17263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Denis Chudov reassigned IGNITE-17263:
-------------------------------------
Assignee: Denis Chudov
> Implement leader to replica safe time propagation
> -------------------------------------------------
>
> Key: IGNITE-17263
> URL: https://issues.apache.org/jira/browse/IGNITE-17263
> Project: Ignite
> Issue Type: Improvement
> Reporter: Alexander Lapin
> Assignee: Denis Chudov
> Priority: Major
> Labels: ignite-3, transaction3_ro
> Attachments: Screenshot from 2022-07-06 16-48-30.png, Screenshot from
> 2022-07-06 16-48-41.png
>
>
> In order to perform replica reads, it's required either to use read index or
> check the safe time. Let's recall corresponding section from tx design
> document.
> RO transactions can be executed on non-primary replicas. write intent
> resolution doesn’t help because a write intent for a committed transaction
> may not be yet replicated to the replica. To mitigate this issue, it’s enough
> to run readIndex on each mapped partition leader, fetch the commit index and
> wait on a replica until it’s applied. This will guarantee that all required
> write intents are replicated and present locally. After that the normal write
> intern resolution should do the job.
> There is a second option, which doesn’t require the network RTT. We can use a
> special low watermark timestamp (safeTs) per replication group, which
> corresponds to the apply index of a replicated entry, so then an apply index
> is advanced during the replication, then the safeTs is monotonically
> incremented too. The HLC used for safeTs advancing is assigned to a
> replicated entry in an ordered way.
> Special measures are needed to periodically advance the safeTs if no updates
> are happening. It’s enough to use a special replication command for this
> purpose.
> All we need during RO txn is to wait until a safeTs advances past the RO txn
> readTs.
> !Screenshot from 2022-07-06 16-48-30.png!
> In the picture we have two concurrent transactions mapped to the same
> partition: T1 and T2.
> OpReq(w1(x)) and OpReq(w2(x)) are received concurrently. Each write intent is
> assigned a timestamp in a monotonic order consistent with the replication
> order. This can be for example done when replication entries are dequeued for
> processing by replication protocol (we assume entries are replicated
> successively.
> It’s not enough only to wait for safeTs - it may never happen due to absence
> of activity in the partition. Consider the next diagram:
> !Screenshot from 2022-07-06 16-48-41.png!
> We need an additional safeTsSync command to propagate a safeTs event in case
> there are no updates in the partition.
> Actually, it seems that it's possible to reuse common raft messages such as
> heartbeatRequests, vote/prevoteRequests together with appendEntriesRequests
> in order to propagate safeTime from leader to replicas. As was mentioned in
> [IGNITE-17261|https://issues.apache.org/jira/browse/IGNITE-17261] txnState
> switch should be linearized with all safe-time propagation requests.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)