[ 
https://issues.apache.org/jira/browse/HDDS-9826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-9826:
-----------------------------------
    Issue Type: Bug  (was: Task)

> Fix exception handling if one Datanode is not available (Ratis)
> ---------------------------------------------------------------
>
>                 Key: HDDS-9826
>                 URL: https://issues.apache.org/jira/browse/HDDS-9826
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM Client
>    Affects Versions: 1.3.0
>            Reporter: Ivan Brusentsev
>            Assignee: Ivan Brusentsev
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.4.0
>
>
> When a key is uploading by XcieverClientRatis, and some datanode becomes 
> unavailable, it is expected that client should request new pipeline to retry 
> upload.
> In fact, before that client tries to repeat commit check with 
> _MAJORITY_COMMITTED_ replication level, which cannot be successful as at that 
> moment pipeline is already closed.
> XceiverClientRatis has method watchForCommit(long index), which contains 
> exception check
>  
> {code:java}
> if (t instanceof GroupMismatchException) {
>   throw e;
> }
> {code}
> GroupMismatchException throws by Ratis client exactly when some datanode is 
> not available and further key upload is not available for current pipeline.
> But this check does not work as 
> {code:java}
> Throwable t = HddsClientUtils.checkForException(e);{code}
>  does not unwrap exception completely.
> The idea is fix lookup of nested exceptions to find proper one. This improves 
> failover latency by 15 seconds approximately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to