[
https://issues.apache.org/jira/browse/HDDS-14593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika updated HDDS-14593:
-------------------------------
Description:
When OM follower is installing snapshot, its appliedIndex will not advance
until the snapshot is installed (i.e. OM DB is fully downloaded from the OM
leader to the OM follower). This can cause all linearizable follower read on
this node to timeout due to the client timeout (default is 3s).
Instead of waiting until the client timeout, OM follower can check whether it
is currently installing snapshot and trigger failover immediately if it does.
Ideally, we might also need to fail any pending read requests. However, this
might need to be done in Ratis.
was:
When OM follower is installing snapshot, its appliedIndex will not advance
until the snapshot is installed (i.e. OM DB is fully downloaded to the OM
follower). This can cause all linearizable follower read on this node to
timeout due to the client timeout (default is 3s).
Instead of waiting until the client timeout, OM follower can check whether it
is currently installing snapshot and trigger failover immediately if it does.
Ideally, we might also need to fail any pending read requests. However, this
might need to be done in Ratis.
> OM follower should trigger failover if it is installing snapshot
> ----------------------------------------------------------------
>
> Key: HDDS-14593
> URL: https://issues.apache.org/jira/browse/HDDS-14593
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
>
> When OM follower is installing snapshot, its appliedIndex will not advance
> until the snapshot is installed (i.e. OM DB is fully downloaded from the OM
> leader to the OM follower). This can cause all linearizable follower read on
> this node to timeout due to the client timeout (default is 3s).
> Instead of waiting until the client timeout, OM follower can check whether it
> is currently installing snapshot and trigger failover immediately if it does.
> Ideally, we might also need to fail any pending read requests. However, this
> might need to be done in Ratis.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]