sodonnel opened a new pull request, #4069: URL: https://github.com/apache/ozone/pull/4069
## What changes were proposed in this pull request? The new and old replication manager sends commands to the datanodes. If the command has not processed on the datanodes within the replicationManager event.timeout, RM assumes the command has failed for some reason, and may send another one to the same or a different host. It makes sense to drop any command not processed on the datanode slightly before ReplicationManager gives up on it. Especially with delete container commands, we don't want to have two or more deletes pending in the system for the same container, when RM thinks there is only 1. To facilitate dropping the commands, we can add a deadline to all commands. Only for commands we want to enforce a deadline on, we can set the deadline in SCM and check for it on the DN side. This change ensure that all commands sent to a datanode from RM will have a deadline set to 0.9 * event.timeout. On the datanode side, we only enforce the deadline on ReplicationContainer, DeleteContainer and ECReconstruction commands. This has turned into quite a large change, but it is split into individual commits for each state so they can be reviewed in isolation. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-7618 ## How was this patch tested? Various new unit tests added. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
