xichen01 opened a new pull request, #4988:
URL: https://github.com/apache/ozone/pull/4988
## What changes were proposed in this pull request?
Currently SCM will send a duplicate `DeletedBlocksTransaction` to the
specify DN if the DN not report the transactions have been finish by the
Heartbeat. So if the `DeleteBlocksCommandHandler` Thread of a DN was Blocked
cause by some reason (Such as wait Container lock) the SCM will send a
duplicate `DeletedBlocksTransaction` to this DN.
This PR is used to avoid this issue by status management of SCM's
`DeleteBlocksCommand`
## Summary
### The Status of `DeleteBlocksCommand`
```java
public enum CmdStatus {
// The DeleteBlocksCommand has not yet been sent.
// This is the initial status of the command after it's created.
TO_BE_SENT,
// This status indicates that the DeleteBlocksCommand has been sent
// to the DataNode, but the Datanode has not reported any new status
// for the DeleteBlocksCommand.
SENT,
// The DeleteBlocksCommand has been received by Datanode and
// is waiting for executed.
PENDING_EXECUTED,
// The DeleteBlocksCommand was executed, and the execution was successful
EXECUTED,
// The DeleteBlocksCommand was executed but failed to execute,
// or was lost before it was executed.
NEED_RESEND
}
```
### State Transfer
TO_BE_SENT -> SENT: The DeleteBlocksCommand is sent by SCM, The follow-up
status has not been updated by Datanode.
SENT -> PENDING_EXECUTED: The DeleteBlocksCommand is sent and received by
the Datanode, but the command is not executed by the Datanode, the command is
waiting to be executed.
SENT -> NEED_RESEND: The DeleteBlocksCommand is sent and lost before it is
received by the DN.
SENT -> EXECUTED: The DeleteBlocksCommand has been sent to Datanode,
executed by DN, and executed successfully.
PENDING_EXECUTED -> PENDING_EXECUTED: The DeleteBlocksCommand continues to
wait to be executed by Datanode.
PENDING_EXECUTED -> NEED_RESEND: The DeleteBlocksCommand waited fora while
and was executed, but the execution failed;Or the DeleteBlocksCommand was lost
while waiting(such as the Datanode restart).
PENDING_EXECUTED -> EXECUTED: The Command waits for a period of time on the
DN and is executed successfully.
### State transition diagram
```mermaid
stateDiagram-v2
TO_BE_SENT
TO_BE_SENT --> SENT
SENT --> PENDING_EXECUTED
SENT --> NEED_RESEND
PENDING_EXECUTED --> PENDING_EXECUTED
PENDING_EXECUTED --> NEED_RESEND
PENDING_EXECUTED --> EXECUTED
SENT --> EXECUTED
```
### DeleteBlocksCommand resent
The `DeleteBlocksCommand` on the `TO_BE_SENT, SENT, PENDING_EXECUTED,
EXECUTED` will not be resent by SCM. only the `DeleteBlocksCommand` on the
`NEED_RESEND ` Status will be resent.
## What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8882
Please replace this section with the link to the Apache JIRA)
## How was this patch tested?
integration test
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]