[jira] [Updated] (HDDS-8505) ReplicationManager: Add configurable global replication limit

ASF GitHub Bot (Jira) Tue, 02 May 2023 04:28:06 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated HDDS-8505:
---------------------------------
    Labels: pull-request-available  (was: )

> ReplicationManager: Add configurable global replication limit
> -------------------------------------------------------------
>
>                 Key: HDDS-8505
>                 URL: https://issues.apache.org/jira/browse/HDDS-8505
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: SCM
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>
> We should make it possible to configure a global replication limit, limiting 
> the number of inflight containers pending creation. A larger cluster would be 
> capable of having more inflight replication than a smaller cluster, so the 
> limit should be a function of the number of datanodes on the cluster, and the 
> limit of the number of commands which can be queued per datanode and some 
> weighting factor.
> For example, if each datanode can queue 20 replication commands, and there 
> are 100 nodes in the cluster, then the natural limit is 20 * 100. However, 
> that assumes that commands are queued evenly across all datanodes, which is 
> unlikely. With a global limit we would prefer that all datanodes are not 
> fully loaded with replication commands simultaneously, so we may want to 
> impose a limit of half that number, with a factor of 0.5, eg 20 * 100 * 0.5 = 
> 1k pending replications.
> At one extreme this would result in all datanodes in the cluster having half 
> their maximum tasks queued, but in practice, some DNs are likely to be at 
> their limit while others have zero or less than half queued.
> If the limits were perfectly defined, such that in a single heartbeat a 
> datanode can complete all its queued work just at the end of the heartbeat 
> interval, then reducing the number of queued commands by half would make the 
> datanode busy for only half its heartbeat interval. As the datanodes will all 
> heartbeat at different times, all the busy and non-work periods across all 
> the datanodes would combine in a load profile that would show some datanodes 
> are always idle, reducing the overall load on the cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-8505) ReplicationManager: Add configurable global replication limit

Reply via email to