[ 
https://issues.apache.org/jira/browse/IGNITE-17872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Chudov updated IGNITE-17872:
----------------------------------
    Issue Type: Improvement  (was: Bug)

> Fetch commit index on non-primary replicas instead of waiting for safe time 
> in case of RO tx on idle cluster
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-17872
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17872
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Denis Chudov
>            Priority: Major
>              Labels: ignite-3, transaction3_ro
>
> Safe time for non-primary replicas (see IGNITE-17263 ) was conceived as 
> optimization to avoid unnecessary network hops. Safe time is propagated from 
> primary replica via raft appendEntries messages. When there is constant load 
> on cluster that is caused by RW transactions, these messages are refreshing 
> safe time on replicas with decent frequency, but in case of idle cluster, or 
> cluster with read-only load, safe time is propagated periodically via 
> heartbeats. This means that, if a RO transaction with read timestamp in 
> present or future, is trying to read a value from non-primary replica, it 
> will wait for safe time first, which is bound to frequency of heartbeat 
> messages, and hence, the duration of the read operation may be close to the 
> period of heartbeats. This looks weird and will cause performance issues. 
> Example:
> Heartbeat period is 500 ms. 
> Current safe time on replica is 1.
> We are processing read-only request with timestamp=2. 
> There were no RW transactions for some time, and the next expected update of 
> safe time, according to the heartbeat period, is 1 + 500 = 501.
> This means that we should wait for about 499 ms (assuming the clock skew and 
> ping in cluster is 0) to proceed with RO request processing.
> So, even though safe time is an optimization, we shouldn't use it in cases 
> when there are no RW transactions affecting the given replica, and the 
> timestamp of current RO transaction is greater than safe time. Instead of 
> waiting for the safe time update, we should fallback to reading index from 
> the leader to minimize the time of processing the current RO request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to