ivandika3 opened a new pull request, #5725:
URL: https://github.com/apache/ozone/pull/5725
## What changes were proposed in this pull request?
XceiverServerRatis#handlePipelineFailure is called in CSM failure scenarios
- XceiverServerRatis#handleNodeSlowness
- From StateMachine#notifyFollowerSlowness
- Set to hdds.ratis.rpc.slowness.timeout (default value 300s)
- Note: Ratis default value is 60s
- XceiverServerRatis#handleNoLeader
- From StateMachine#notifyExtendedNoLeader
- Set to hdds.ratis.notification.no-leader.timeout (default value 300s)
- Note: Ratis default value is 60s
- XceiverServerRatis#handleInstallSnapshotFromLeader
- From StateMachine#notifyInstallSnapshotFromLeader
Currently, XceiverServerRatis#handlePipelineFailure does not trigger
Heartbeat to SCM immediately. Instead, it waits until the next heartbeat
(default 60s) to send the pipeline close action command. This might cause SCM
to still allocate blocks to these "failed" pipelines during this duration which
might impact on client writing to these blocks.
To minimize the impact on the client and the datanodes on the failed
pipeline. I suggest that the datanode trigger the pipeline close command
immediately for every pipeline action close command triggered due to pipeline
failure.
## What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-9823
## How was this patch tested?
Existing tests.
Clean CI run: https://github.com/ivandika3/ozone/actions/runs/7084351468
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]