Denis Chudov created IGNITE-17872:
-------------------------------------

             Summary: Fetch commit index on non-primary replicas instead of 
waiting for safe time in case of RO tx on idle cluster
                 Key: IGNITE-17872
                 URL: https://issues.apache.org/jira/browse/IGNITE-17872
             Project: Ignite
          Issue Type: Bug
         Environment: Safe time for non-primary replicas (see IGNITE-17263 ) 
was conceived as optimization to avoid unnecessary network hops. Safe time is 
propagated from primary replica via raft appendEntries messages. When there is 
constant load on cluster that is caused by RW transactions, these messages are 
refreshing safe time on replicas with decent frequency, but in case of idle 
cluster, or cluster with read-only load, safe time is propagated periodically 
via heartbeats. This means that, if a RO transaction with read timestamp in 
present or future, is trying to read a value from non-primary replica, it will 
wait for safe time first, which is bound to frequency of heartbeat messages, 
and hence, the duration of the read operation may be close to the period of 
heartbeats. This looks weird and will cause performance issues. 

Example:
Heartbeat period is 500 ms. 
Current safe time on replica is 1.
We are processing read-only request with timestamp=2. 
Next expected update of safe time, according to the heartbeat period, is 1 + 
500 = 501.
This means that we should wait for about 499 ms (assuming the clock skew and 
ping in cluster is 0) to proceed with RO request processing.

So, even though safe time is an optimization, we shouldn't use it in cases when 
there are no RW transactions affecting the given replica, and the timestamp of 
current RO transaction is greater than safe time. Instead of waiting of the 
safe time update, we should fallback to reading index from the leader to 
minimize the time of processing the current RO request.
            Reporter: Denis Chudov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to