[
https://issues.apache.org/jira/browse/IGNITE-17872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexandr Shapkin resolved IGNITE-17872.
---------------------------------------
Resolution: Won't Do
> Fetch commit index on non-primary replicas instead of waiting for safe time
> in case of RO tx on idle cluster
> ------------------------------------------------------------------------------------------------------------
>
> Key: IGNITE-17872
> URL: https://issues.apache.org/jira/browse/IGNITE-17872
> Project: Ignite
> Issue Type: Improvement
> Reporter: Denis Chudov
> Priority: Major
> Labels: ignite-3, transaction3_ro
>
> Safe time for non-primary replicas (see IGNITE-17263 ) was conceived as
> optimization to avoid unnecessary network hops. Safe time is propagated from
> primary replica via raft appendEntries messages. When there is constant load
> on cluster that is caused by RW transactions, these messages are refreshing
> safe time on replicas with decent frequency, but in case of idle cluster, or
> cluster with read-only load, safe time is propagated periodically via
> heartbeats. This means that, if a RO transaction with read timestamp in
> present or future, is trying to read a value from non-primary replica, it
> will wait for safe time first, which is bound to frequency of heartbeat
> messages, and hence, the duration of the read operation may be close to the
> period of heartbeats. This looks weird and will cause performance issues.
> Example:
> Heartbeat period is 500 ms.
> Current safe time on replica is 1.
> We are processing read-only request with timestamp=2.
> There were no RW transactions for some time, and the next expected update of
> safe time, according to the heartbeat period, is 1 + 500 = 501.
> This means that we should wait for about 499 ms (assuming the clock skew and
> ping in cluster is 0) to proceed with RO request processing.
> So, even though safe time is an optimization, we shouldn't use it in cases
> when there are no RW transactions affecting the given replica, and the
> timestamp of current RO transaction is greater than safe time. Instead of
> waiting for the safe time update, we should fallback to reading index from
> the leader to minimize the time of processing the current RO request.
> To do this, we should compare the read timestamp with safe time, and if read
> timestamp is greater, and since the last RW transaction (affecting this
> replica) some time passed that is greater than some timeout (i.e. we expect
> that the safe time will be updated only via periodic updates) we shouldn't
> wait for safe time and perform read index request to leader to get the latest
> updates that may not have been replicated yet.
> If readIndex shows that the current committed index on leader is the same
> that is on replica (i.e. replica doesn't expect any updates that are being
> replicated) it means that read-only request must wait for safe time update
> without further attempts to repeat readIndex operation which highly likely
> will be mostly useless.
> We should also think about the measures to prevent extra load from replicas
> spamming readIndex while receiving multiple read-only requests.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)