[
https://issues.apache.org/jira/browse/HDDS-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598466#comment-16598466
]
Mukul Kumar Singh commented on HDDS-263:
----------------------------------------
Thanks for updating the patch [~shashikant]. The patch looks really good to me.
Some very minor comments.
1) For BlockNotCommittedException, the result field is not required, it will
always be ContainerProtos.Result.BLOCK_NOT_COMMITTED.
2) Please fix the checkstyle issues.
3) DistributedStorageHandler.java:25,26,72. unused import
4) Can we change,
ozone.client.interval.between.retries.on.block.commit.exception to ->
ozone.client.retry.interval
5) Also ozone.client.max.retries.on.block.commit.exception ->
ozone.client.max.retries.
Also lets add that for now, Blocknot committed exception is a reason for retry.
6) RpcClient.java:32,76, unused import
7) RpcClient.java:34, wildcard import
8) TestCloseContainerHandlingByClient, lets generate a random number here, so
that different iterations of the test will try different number of retries.
> Add retries in Ozone Client to handle BLOCK_NOT_COMMITTED Exception
> -------------------------------------------------------------------
>
> Key: HDDS-263
> URL: https://issues.apache.org/jira/browse/HDDS-263
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Client
> Reporter: Shashikant Banerjee
> Assignee: Shashikant Banerjee
> Priority: Blocker
> Fix For: 0.2.1
>
> Attachments: HDDS-263.00.patch, HDDS-263.01.patch, HDDS-263.02.patch,
> HDDS-263.03.patch, HDDS-263.04.patch
>
>
> While Ozone client writes are going on, a container on a datanode can gets
> closed because of node failures, disk out of space etc. In situations as
> such, client write will fail with CLOSED_CONTAINER_IO. In this case, ozone
> client should try to get the committed block length for the pending open
> blocks and update the OzoneManager. While trying to get the committed block
> length, it may fail with BLOCK_NOT_COMMITTED exception because as a part of
> transiton from CLOSING to CLOSED state for the container , it commits all
> open blocks one by one. In such cases, client needs to retry to get the
> committed block length for a fixed no of attempts and eventually throw the
> exception to the application if its not able to successfully get and update
> the length in the OzoneManager. This Jira aims to address this.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]