[
https://issues.apache.org/jira/browse/HDDS-9826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Attila Doroszlai updated HDDS-9826:
-----------------------------------
Issue Type: Bug (was: Task)
> Fix exception handling if one Datanode is not available (Ratis)
> ---------------------------------------------------------------
>
> Key: HDDS-9826
> URL: https://issues.apache.org/jira/browse/HDDS-9826
> Project: Apache Ozone
> Issue Type: Bug
> Components: SCM Client
> Affects Versions: 1.3.0
> Reporter: Ivan Brusentsev
> Assignee: Ivan Brusentsev
> Priority: Minor
> Labels: pull-request-available
> Fix For: 1.4.0
>
>
> When a key is uploading by XcieverClientRatis, and some datanode becomes
> unavailable, it is expected that client should request new pipeline to retry
> upload.
> In fact, before that client tries to repeat commit check with
> _MAJORITY_COMMITTED_ replication level, which cannot be successful as at that
> moment pipeline is already closed.
> XceiverClientRatis has method watchForCommit(long index), which contains
> exception check
>
> {code:java}
> if (t instanceof GroupMismatchException) {
> throw e;
> }
> {code}
> GroupMismatchException throws by Ratis client exactly when some datanode is
> not available and further key upload is not available for current pipeline.
> But this check does not work as
> {code:java}
> Throwable t = HddsClientUtils.checkForException(e);{code}
> does not unwrap exception completely.
> The idea is fix lookup of nested exceptions to find proper one. This improves
> failover latency by 15 seconds approximately.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]