[
https://issues.apache.org/jira/browse/HDDS-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682092#comment-16682092
]
Shashikant Banerjee commented on HDDS-709:
------------------------------------------
Thanks[~jnp], for the comments.
{noformat}
In checkIfContainerNotOpenException, why do we need to dig through exceptions?
Is it possible to communicate back via protocol?{noformat}
There are two ways to receive an exception at the client. One is to embed the
error code on the datanode in the ContainerCommandResponse and pass in
RaftClientReply msg. The other way to set the Exception inside RaftClientReply
which is converted to StateMachineException and then CompletionException inside
Ratis.
In this case, since the operation will be failed at the startTransaction phase
only, only way to propagate the error to the client is to set the exception in
TransactionContext which will wrap the exception inside StateMachineException
citing it as a failure in protocol and set it inside RaftClientReply. There is
no ContainerCommandResponse in such case, as the command never gets executed in
startTransaction.
We need to handle the exception client and hence have to dig throw the wrapped
exceptions.
{noformat}
if (containerState == State.OPEN || containerState == State.CLOSING) Ideally we
should not need this check to mark container UNHEALTHY. For a CLOSED container,
it should not even come to this code path. {noformat}
This check is there mark the container unhealthy in case there is an
applyTransaction failure while execution inside Datanode as per discussion in
HDDS-579. For marking a Closed container unhealthy, either client should detect
corrupted blocks and tell SCM to move the container to unhealthy/ or
datanodeself it discover disk failures and mark container replica existing on
these disks unhealthy. These cases are not covered in the scope of this Jira.
Rest of the review comments are addressed in the v5 patch.
> Modify Close Container handling sequence on datanodes
> -----------------------------------------------------
>
> Key: HDDS-709
> URL: https://issues.apache.org/jira/browse/HDDS-709
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Datanode
> Reporter: Shashikant Banerjee
> Assignee: Shashikant Banerjee
> Priority: Major
> Attachments: HDDS-709.000.patch, HDDS-709.001.patch,
> HDDS-709.002.patch, HDDS-709.003.patch, HDDS-709.004.patch, HDDS-709.005.patch
>
>
> With quasi closed container state for handling majority node failures, the
> close container handling sequence in Datanodes need to change. Once the
> datanodes receive a close container command from SCM, the open container
> replicas individually be marked in the closing state. In a closing state,
> only the transactions coming from the Ratis leader are allowed , all other
> write transaction will fail. A close container transaction will be queued via
> Ratis on the leader which will be replayed to the followers which makes it
> transition to CLOSED/QUASI CLOSED state.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]