[
https://issues.apache.org/jira/browse/HDDS-1610?focusedWorklogId=289667&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-289667
]
ASF GitHub Bot logged work on HDDS-1610:
----------------------------------------
Author: ASF GitHub Bot
Created on: 06/Aug/19 13:48
Start Date: 06/Aug/19 13:48
Worklog Time Spent: 10m
Work Description: bshashikant commented on pull request #1226: HDDS-1610.
applyTransaction failure should not be lost on restart.
URL: https://github.com/apache/hadoop/pull/1226#discussion_r311072353
##########
File path:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/XceiverServerRatis.java
##########
@@ -609,6 +609,16 @@ void handleNoLeader(RaftGroupId groupId, RoleInfoProto
roleInfoProto) {
handlePipelineFailure(groupId, roleInfoProto);
}
+ void handleApplyTransactionFailure(RaftGroupId groupId,
+ RaftProtos.RaftPeerRole role) {
+ UUID dnId = RatisHelper.toDatanodeId(getServer().getId());
+ String msg =
+ "Ratis Transaction failure in datanode" + dnId + " with role " + role
+ + " Triggering pipeline close action.";
+ triggerPipelineClose(groupId, msg,
ClosePipelineInfo.Reason.PIPELINE_FAILED,
Review comment:
I think, the msg will differentiate what was the cause of the error. The
reason code is just for SCM to take action of closing the pipeline. I don't
think possibly SCM needs to differentiate its behaviour depending on why the
pipelien failed.
If required, we can add it in a separate jira as it needs to change for
other reasons of pipeline failure.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 289667)
Time Spent: 1h (was: 50m)
> applyTransaction failure should not be lost on restart
> ------------------------------------------------------
>
> Key: HDDS-1610
> URL: https://issues.apache.org/jira/browse/HDDS-1610
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Reporter: Shashikant Banerjee
> Assignee: Shashikant Banerjee
> Priority: Blocker
> Labels: pull-request-available
> Time Spent: 1h
> Remaining Estimate: 0h
>
> If the applyTransaction fails in the containerStateMachine, then the
> container should not accept new writes on restart,.
> This can occur if
> # chunk write applyTransaction fails
> # container state update to UNHEALTHY also fails
> # Ratis snapshot is taken
> # Node restarts
> # container accepts new transactions
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]