sodonnel opened a new pull request, #4636:
URL: https://github.com/apache/ozone/pull/4636

   ## What changes were proposed in this pull request?
   
   We should make it possible to configure a global replication limit, limiting 
the number of inflight containers pending creation. A larger cluster would be 
capable of having more inflight replication than a smaller cluster, so the 
limit should be a function of the number of datanodes on the cluster, and the 
limit of the number of commands which can be queued per datanode and some 
weighting factor.
   
   For example, if each datanode can queue 20 replication commands, and there 
are 100 nodes in the cluster, then the natural limit is 20 * 100. However, that 
assumes that commands are queued evenly across all datanodes, which is 
unlikely. With a global limit we would prefer that all datanodes are not fully 
loaded with replication commands simultaneously, so we may want to impose a 
limit of half that number, with a factor of 0.5, eg 20 * 100 * 0.5 = 1k pending 
replications.
   
   At one extreme this would result in all datanodes in the cluster having half 
their maximum tasks queued, but in practice, some DNs are likely to be at their 
limit while others have zero or less than half queued.
   
   If the limits were perfectly defined, such that in a single heartbeat a 
datanode can complete all its queued work just at the end of the heartbeat 
interval, then reducing the number of queued commands by half would make the 
datanode busy for only half its heartbeat interval. As the datanodes will all 
heartbeat at different times, all the busy and non-work periods across all the 
datanodes would combine in a load profile that would show some datanodes are 
always idle, reducing the overall load on the cluster.
   
   This PR introduces a default limit factor of 0.75, which can be disabled by 
setting hte factor to 0. Only inflight replications are throttled via this 
mechanism - delete containers are throttled by the per datanode limits 
introduced some time back.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-8505
   
   ## How was this patch tested?
   
   New unit tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to