[
https://issues.apache.org/jira/browse/HDDS-14040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18042078#comment-18042078
]
Tsz-wo Sze commented on HDDS-14040:
-----------------------------------
[~sumitagrawl], thanks for working on this! Some questions:
bq. Ozone client Ratis recieves exception such as AlreadyClosedException, and
keep retry
AlreadyClosedException is an exception after close. What has caused it to
close?
bq. It exists only on GroupMismatchException
Why it has GroupMismatchException?
> Ozone client hang for data write in failure scenario
> ----------------------------------------------------
>
> Key: HDDS-14040
> URL: https://issues.apache.org/jira/browse/HDDS-14040
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Sumit Agrawal
> Assignee: Sumit Agrawal
> Priority: Major
> Labels: pull-request-available
>
> When datanode is full and do not have space,
> * ozone client when try to write the data to the datanode, it fails as disk
> out of space.
> * ozone client keeps retry for longer duration such as 5+ minute with same
> request
>
> With debug,
> * DN Server throw StorageContainerException to Ratis, and its just got hidden
> * Ozone client Ratis recieves exception such as AlreadyClosedException, and
> keep retry
> * It exists only on GroupMismatchException
> * Client never recieves proper error to check and avoid retry
> Alternate solution,
> Instead of throwing Exception to Ratis by DN Server, return failure response.
> With this, client is receiving proper error failure and doing below retry
> with immediate failure,
> # Every block write failure, retry 2 more times (ie total 3 retries)
> # And try to allocate another block and retry (6 times)
> # So overall, 3*6 times retry for continuous failure for write chunk
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]