[ 
https://issues.apache.org/jira/browse/HDDS-14040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Agrawal resolved HDDS-14040.
----------------------------------
    Fix Version/s: 2.2.0
       Resolution: Fixed

> Ozone client hang for data write in failure scenario
> ----------------------------------------------------
>
>                 Key: HDDS-14040
>                 URL: https://issues.apache.org/jira/browse/HDDS-14040
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Sumit Agrawal
>            Assignee: Sumit Agrawal
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.2.0
>
>
> When datanode is full and do not have space,
>  * ozone client when try to write the data to the datanode, it fails as disk 
> out of space.
>  * ozone client keeps retry for longer duration such as 5+ minute with same 
> request
>  
> With debug,
>  * DN Server throw StorageContainerException to Ratis, and its just got hidden
>  * Ozone client Ratis recieves exception such as AlreadyClosedException, and 
> keep retry
>  * It exists only on GroupMismatchException
>  * Client never recieves proper error to check and avoid retry
> Alternate solution,
> Instead of throwing Exception to Ratis by DN Server, return failure response.
> With this, client is receiving proper error failure and doing below retry 
> with immediate failure,
>  # Every block write failure, retry 2 more times (ie total 3 retries)
>  # And try to allocate another block and retry (6 times)
>  # So overall, 3*6 times retry for continuous failure for write chunk
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to