Sumit Agrawal created HDDS-14040:
------------------------------------
Summary: Ozone client hang for data write in failure scenario
Key: HDDS-14040
URL: https://issues.apache.org/jira/browse/HDDS-14040
Project: Apache Ozone
Issue Type: Improvement
Reporter: Sumit Agrawal
Assignee: Sumit Agrawal
When datanode is full and do not have space,
* ozone client when try to write the data to the datanode, it fails as disk
out of space.
* ozone client keeps retry for longer duration such as 5+ minute with same
request
With debug,
* DN Server throw StorageContainerException to Ratis, and its just got hidden
* Ozone client Ratis recieves exception such as AlreadyClosedException, and
keep retry
* It exists only on GroupMismatchException
* Client never recieves proper error to check and avoid retry
Alternate solution,
Instead of throwing Exception to Ratis by DN Server, return failure response.
With this, client is receiving proper error failure and doing below retry with
immediate failure,
# Every block write failure, retry 2 more times (ie total 3 retries)
# And try to allocate another block and retry (6 times)
# So overall, 3*6 times retry for continuous failure for write chunk
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]