Ivan Brusentsev created HDDS-9826:
-------------------------------------
Summary: Fix exception handling if one Datanode is not available
(Ratis)
Key: HDDS-9826
URL: https://issues.apache.org/jira/browse/HDDS-9826
Project: Apache Ozone
Issue Type: Task
Components: SCM Client
Affects Versions: 1.3.0
Reporter: Ivan Brusentsev
Assignee: Ivan Brusentsev
When a key is uploading by XcieverClientRatis, and some datanode becomes
unavailable, it is expected that client should request new pipeline to retry
upload.
In fact, before that client tries to repeat commit check with
_MAJORITY_COMMITTED_ replication level, which cannot be successful as at that
moment pipeline is already closed.
XceiverClientRatis has method watchForCommit(long index), which contains
exception check
{code:java}
if (t instanceof GroupMismatchException) {
throw e;
}
{code}
GroupMismatchException throws by Ratis client exactly when some datanode is not
available and further key upload is not available for current pipeline.
But this check does not work as
{code:java}
Throwable t = HddsClientUtils.checkForException(e);{code}
does not unwrap exception completely.
The idea is fix lookup of nested exceptions to find proper one. This improve
failover latency by 15 seconds approximately.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]