[ 
https://issues.apache.org/jira/browse/HDDS-14593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Andika updated HDDS-14593:
-------------------------------
    Description: 
When OM follower is installing snapshot, its appliedIndex will not advance 
until the snapshot is installed (i.e. OM DB is fully downloaded from the OM 
leader to the OM follower). This can cause all linearizable follower read on 
this node to timeout due to the client timeout (default is 3s). 

Instead of waiting until the client timeout, OM follower can check whether it 
is currently installing snapshot and trigger failover immediately if it does.

Ideally, we might also need to fail any pending read requests. However, this 
might need to be done in Ratis.

  was:
When OM follower is installing snapshot, its appliedIndex will not advance 
until the snapshot is installed (i.e. OM DB is fully downloaded to the OM 
follower). This can cause all linearizable follower read on this node to 
timeout due to the client timeout (default is 3s). 

Instead of waiting until the client timeout, OM follower can check whether it 
is currently installing snapshot and trigger failover immediately if it does.

Ideally, we might also need to fail any pending read requests. However, this 
might need to be done in Ratis.


> OM follower should trigger failover if it is installing snapshot
> ----------------------------------------------------------------
>
>                 Key: HDDS-14593
>                 URL: https://issues.apache.org/jira/browse/HDDS-14593
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>
> When OM follower is installing snapshot, its appliedIndex will not advance 
> until the snapshot is installed (i.e. OM DB is fully downloaded from the OM 
> leader to the OM follower). This can cause all linearizable follower read on 
> this node to timeout due to the client timeout (default is 3s). 
> Instead of waiting until the client timeout, OM follower can check whether it 
> is currently installing snapshot and trigger failover immediately if it does.
> Ideally, we might also need to fail any pending read requests. However, this 
> might need to be done in Ratis.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to