[jira] [Commented] (HDDS-14040) Ozone client hang for data write in failure scenario

Tsz-wo Sze (Jira) Mon, 01 Dec 2025 21:59:04 -0800


    [ 
https://issues.apache.org/jira/browse/HDDS-14040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18042078#comment-18042078
 ]


Tsz-wo Sze commented on HDDS-14040:
-----------------------------------

[~sumitagrawl], thanks for working on this!  Some questions:

bq. Ozone client Ratis recieves exception such as AlreadyClosedException, and 
keep retry

AlreadyClosedException is an exception after close.  What has caused it to 
close?

bq. It exists only on GroupMismatchException

Why it has GroupMismatchException?

> Ozone client hang for data write in failure scenario
> ----------------------------------------------------
>
>                 Key: HDDS-14040
>                 URL: https://issues.apache.org/jira/browse/HDDS-14040
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Sumit Agrawal
>            Assignee: Sumit Agrawal
>            Priority: Major
>              Labels: pull-request-available
>
> When datanode is full and do not have space,
>  * ozone client when try to write the data to the datanode, it fails as disk 
> out of space.
>  * ozone client keeps retry for longer duration such as 5+ minute with same 
> request
>  
> With debug,
>  * DN Server throw StorageContainerException to Ratis, and its just got hidden
>  * Ozone client Ratis recieves exception such as AlreadyClosedException, and 
> keep retry
>  * It exists only on GroupMismatchException
>  * Client never recieves proper error to check and avoid retry
> Alternate solution,
> Instead of throwing Exception to Ratis by DN Server, return failure response.
> With this, client is receiving proper error failure and doing below retry 
> with immediate failure,
>  # Every block write failure, retry 2 more times (ie total 3 retries)
>  # And try to allocate another block and retry (6 times)
>  # So overall, 3*6 times retry for continuous failure for write chunk
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-14040) Ozone client hang for data write in failure scenario

Reply via email to