Sumit Agrawal created HDDS-14040:
------------------------------------

             Summary: Ozone client hang for data write in failure scenario
                 Key: HDDS-14040
                 URL: https://issues.apache.org/jira/browse/HDDS-14040
             Project: Apache Ozone
          Issue Type: Improvement
            Reporter: Sumit Agrawal
            Assignee: Sumit Agrawal


When datanode is full and do not have space,
 * ozone client when try to write the data to the datanode, it fails as disk 
out of space.
 * ozone client keeps retry for longer duration such as 5+ minute with same 
request

 

With debug,
 * DN Server throw StorageContainerException to Ratis, and its just got hidden
 * Ozone client Ratis recieves exception such as AlreadyClosedException, and 
keep retry
 * It exists only on GroupMismatchException
 * Client never recieves proper error to check and avoid retry

Alternate solution,

Instead of throwing Exception to Ratis by DN Server, return failure response.

With this, client is receiving proper error failure and doing below retry with 
immediate failure,
 # Every block write failure, retry 2 more times (ie total 3 retries)
 # And try to allocate another block and retry (6 times)
 # So overall, 3*6 times retry for continuous failure for write chunk

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to