[
https://issues.apache.org/jira/browse/HDDS-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Attila Doroszlai updated HDDS-8505:
-----------------------------------
Fix Version/s: 1.4.0
Resolution: Implemented
Status: Resolved (was: Patch Available)
> ReplicationManager: Add configurable global replication limit
> -------------------------------------------------------------
>
> Key: HDDS-8505
> URL: https://issues.apache.org/jira/browse/HDDS-8505
> Project: Apache Ozone
> Issue Type: Sub-task
> Components: SCM
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.4.0
>
>
> We should make it possible to configure a global replication limit, limiting
> the number of inflight containers pending creation. A larger cluster would be
> capable of having more inflight replication than a smaller cluster, so the
> limit should be a function of the number of datanodes on the cluster, and the
> limit of the number of commands which can be queued per datanode and some
> weighting factor.
> For example, if each datanode can queue 20 replication commands, and there
> are 100 nodes in the cluster, then the natural limit is 20 * 100. However,
> that assumes that commands are queued evenly across all datanodes, which is
> unlikely. With a global limit we would prefer that all datanodes are not
> fully loaded with replication commands simultaneously, so we may want to
> impose a limit of half that number, with a factor of 0.5, eg 20 * 100 * 0.5 =
> 1k pending replications.
> At one extreme this would result in all datanodes in the cluster having half
> their maximum tasks queued, but in practice, some DNs are likely to be at
> their limit while others have zero or less than half queued.
> If the limits were perfectly defined, such that in a single heartbeat a
> datanode can complete all its queued work just at the end of the heartbeat
> interval, then reducing the number of queued commands by half would make the
> datanode busy for only half its heartbeat interval. As the datanodes will all
> heartbeat at different times, all the busy and non-work periods across all
> the datanodes would combine in a load profile that would show some datanodes
> are always idle, reducing the overall load on the cluster.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]