sodonnel opened a new pull request, #4069:
URL: https://github.com/apache/ozone/pull/4069

   ## What changes were proposed in this pull request?
   
   The new and old replication manager sends commands to the datanodes. If the 
command has not processed on the datanodes within the replicationManager 
event.timeout, RM assumes the command has failed for some reason, and may send 
another one to the same or a different host.
   
   It makes sense to drop any command not processed on the datanode slightly 
before ReplicationManager gives up on it. Especially with delete container 
commands, we don't want to have two or more deletes pending in the system for 
the same container, when RM thinks there is only 1.
   
   To facilitate dropping the commands, we can add a deadline to all commands. 
Only for commands we want to enforce a deadline on, we can set the deadline in 
SCM and check for it on the DN side.
   
   This change ensure that all commands sent to a datanode from RM will have a 
deadline set to 0.9 * event.timeout. On the datanode side, we only enforce the 
deadline on ReplicationContainer, DeleteContainer and ECReconstruction commands.
   
   This has turned into quite a large change, but it is split into individual 
commits for each state so they can be reviewed in isolation.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-7618
   
   ## How was this patch tested?
   
   Various new unit tests added.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to