[
https://issues.apache.org/jira/browse/HDDS-13437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HDDS-13437:
----------------------------------
Labels: pull-request-available (was: )
> Avoid scheduling replications on full datanodes by tracking pending op size
> in SCM
> ----------------------------------------------------------------------------------
>
> Key: HDDS-13437
> URL: https://issues.apache.org/jira/browse/HDDS-13437
> Project: Apache Ozone
> Issue Type: Sub-task
> Components: SCM
> Reporter: Siddhant Sangwan
> Assignee: Siddhant Sangwan
> Priority: Major
> Labels: pull-request-available
>
> SCM schedules replication commands to fix under replication, mis replication,
> for container moves, decommission etc. for both Ratis and EC containers. It
> checks whether a target Datanode has 2 * containerSize amount of space before
> selecting it as the target (the node that the container will be replicated
> to). However it does not track what size of data it has already scheduled and
> is in-flight. This means it can end up over scheduling replications to a
> target Datanode that does not have enough space.
> For example if the remaining space in Dn is 30 gb and min free space is 20
> gb, it can get have only 1 more container (2 * 5 gb, see
> https://issues.apache.org/jira/browse/HDDS-12426). Currently the replication
> manager will keep selecting it as a target for multiple containers.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]