[ 
https://issues.apache.org/jira/browse/HDDS-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598466#comment-16598466
 ] 

Mukul Kumar Singh commented on HDDS-263:
----------------------------------------

Thanks for updating the patch [~shashikant]. The patch looks really good to me. 
Some very minor comments.

1) For BlockNotCommittedException, the result field is not required, it will 
always be ContainerProtos.Result.BLOCK_NOT_COMMITTED.
2) Please fix the checkstyle issues.
3) DistributedStorageHandler.java:25,26,72. unused import
4) Can we change, 
ozone.client.interval.between.retries.on.block.commit.exception to -> 
ozone.client.retry.interval
5) Also ozone.client.max.retries.on.block.commit.exception -> 
ozone.client.max.retries.
Also lets add that for now, Blocknot committed exception is a reason for retry.
6) RpcClient.java:32,76, unused import
7) RpcClient.java:34, wildcard import
8) TestCloseContainerHandlingByClient, lets generate a random number here, so 
that different iterations of the test will try different number of retries.



> Add retries in Ozone Client to handle BLOCK_NOT_COMMITTED Exception
> -------------------------------------------------------------------
>
>                 Key: HDDS-263
>                 URL: https://issues.apache.org/jira/browse/HDDS-263
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Client
>            Reporter: Shashikant Banerjee
>            Assignee: Shashikant Banerjee
>            Priority: Blocker
>             Fix For: 0.2.1
>
>         Attachments: HDDS-263.00.patch, HDDS-263.01.patch, HDDS-263.02.patch, 
> HDDS-263.03.patch, HDDS-263.04.patch
>
>
> While Ozone client writes are going on, a container on a datanode can gets 
> closed because of node failures, disk out of space etc. In situations as 
> such, client write will fail with CLOSED_CONTAINER_IO. In this case, ozone 
> client should try to get the committed block length for the pending open 
> blocks and update the OzoneManager. While trying to get the committed block 
> length, it may fail with BLOCK_NOT_COMMITTED exception because as a part of 
> transiton from CLOSING to CLOSED state for the container , it commits all 
> open blocks one by one. In such cases, client needs to retry to get the 
> committed block length for a fixed no of attempts and eventually throw the 
> exception to the application if its not able to successfully get and update 
> the length in the OzoneManager. This Jira aims to address this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to