[ 
https://issues.apache.org/jira/browse/IGNITE-17872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18082249#comment-18082249
 ] 

Alexandr Shapkin commented on IGNITE-17872:
-------------------------------------------

This is not relevant anymore.

> Fetch commit index on non-primary replicas instead of waiting for safe time 
> in case of RO tx on idle cluster
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-17872
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17872
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Denis Chudov
>            Priority: Major
>              Labels: ignite-3, transaction3_ro
>
> Safe time for non-primary replicas (see IGNITE-17263 ) was conceived as 
> optimization to avoid unnecessary network hops. Safe time is propagated from 
> primary replica via raft appendEntries messages. When there is constant load 
> on cluster that is caused by RW transactions, these messages are refreshing 
> safe time on replicas with decent frequency, but in case of idle cluster, or 
> cluster with read-only load, safe time is propagated periodically via 
> heartbeats. This means that, if a RO transaction with read timestamp in 
> present or future, is trying to read a value from non-primary replica, it 
> will wait for safe time first, which is bound to frequency of heartbeat 
> messages, and hence, the duration of the read operation may be close to the 
> period of heartbeats. This looks weird and will cause performance issues. 
> Example:
> Heartbeat period is 500 ms. 
> Current safe time on replica is 1.
> We are processing read-only request with timestamp=2. 
> There were no RW transactions for some time, and the next expected update of 
> safe time, according to the heartbeat period, is 1 + 500 = 501.
> This means that we should wait for about 499 ms (assuming the clock skew and 
> ping in cluster is 0) to proceed with RO request processing.
> So, even though safe time is an optimization, we shouldn't use it in cases 
> when there are no RW transactions affecting the given replica, and the 
> timestamp of current RO transaction is greater than safe time. Instead of 
> waiting for the safe time update, we should fallback to reading index from 
> the leader to minimize the time of processing the current RO request.
> To do this, we should compare the read timestamp with safe time, and if read 
> timestamp is greater, and since the last RW transaction (affecting this 
> replica) some time passed that is greater than some timeout (i.e. we expect 
> that the safe time will be updated only via periodic updates) we shouldn't 
> wait for safe time and perform read index request to leader to get the latest 
> updates that may not have been replicated yet. 
> If readIndex shows that the current committed index on leader is the same 
> that is on replica (i.e. replica doesn't expect any updates that are being 
> replicated) it means that read-only request must wait for safe time update 
> without further attempts to repeat readIndex operation which highly likely 
> will be mostly useless.
> We should also think about the measures to prevent extra load from replicas 
> spamming readIndex while receiving multiple read-only requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to