[
https://issues.apache.org/jira/browse/HDDS-14040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sumit Agrawal resolved HDDS-14040.
----------------------------------
Fix Version/s: 2.2.0
Resolution: Fixed
> Ozone client hang for data write in failure scenario
> ----------------------------------------------------
>
> Key: HDDS-14040
> URL: https://issues.apache.org/jira/browse/HDDS-14040
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Sumit Agrawal
> Assignee: Sumit Agrawal
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.2.0
>
>
> When datanode is full and do not have space,
> * ozone client when try to write the data to the datanode, it fails as disk
> out of space.
> * ozone client keeps retry for longer duration such as 5+ minute with same
> request
>
> With debug,
> * DN Server throw StorageContainerException to Ratis, and its just got hidden
> * Ozone client Ratis recieves exception such as AlreadyClosedException, and
> keep retry
> * It exists only on GroupMismatchException
> * Client never recieves proper error to check and avoid retry
> Alternate solution,
> Instead of throwing Exception to Ratis by DN Server, return failure response.
> With this, client is receiving proper error failure and doing below retry
> with immediate failure,
> # Every block write failure, retry 2 more times (ie total 3 retries)
> # And try to allocate another block and retry (6 times)
> # So overall, 3*6 times retry for continuous failure for write chunk
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]