Ivan Brusentsev created HDDS-9826:
-------------------------------------

             Summary: Fix exception handling if one Datanode is not available 
(Ratis)
                 Key: HDDS-9826
                 URL: https://issues.apache.org/jira/browse/HDDS-9826
             Project: Apache Ozone
          Issue Type: Task
          Components: SCM Client
    Affects Versions: 1.3.0
            Reporter: Ivan Brusentsev
            Assignee: Ivan Brusentsev


When a key is uploading by XcieverClientRatis, and some datanode becomes 
unavailable, it is expected that client should request new pipeline to retry 
upload.

In fact, before that client tries to repeat commit check with 
_MAJORITY_COMMITTED_ replication level, which cannot be successful as at that 
moment pipeline is already closed.

XceiverClientRatis has method watchForCommit(long index), which contains 
exception check

 
{code:java}
if (t instanceof GroupMismatchException) {
  throw e;
}
{code}
GroupMismatchException throws by Ratis client exactly when some datanode is not 
available and further key upload is not available for current pipeline.

But this check does not work as 
{code:java}
Throwable t = HddsClientUtils.checkForException(e);{code}
 does not unwrap exception completely.

The idea is fix lookup of nested exceptions to find proper one. This improve 
failover latency by 15 seconds approximately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to