[
https://issues.apache.org/jira/browse/HDDS-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599707#comment-17599707
]
Stephen O'Donnell commented on HDDS-7198:
-----------------------------------------
For EC, things will be worse. We have opted to do container copy for EC, so
there will only ever be the decommission node as the source.
> SCM could shuffle the list of in service replicas and place the
> decommissioning replica last in the list.
A decommissioning node will have zero write load. Perhaps we could return it
last for normal reads to alleviate that load on it too. However we don't want
to de-prioritize the decommissioning nodes completely - if they are not serving
writes and potentially not reads, they will be otherwise idle.
> Datanodes should avoid using decommissioning nodes as a container replication
> source
> ------------------------------------------------------------------------------------
>
> Key: HDDS-7198
> URL: https://issues.apache.org/jira/browse/HDDS-7198
> Project: Apache Ozone
> Issue Type: Improvement
> Components: Ozone Datanode, SCM
> Reporter: Ethan Rose
> Priority: Major
>
> Currently when SCM tells a target datanode to replicate a container, it sends
> the target datanode an ordered list of source datanodes it should download
> the container from. The target then shuffles the list and tries to download
> from the sources in the resulting order one by one until one of them succeeds.
> In failure scenarios this works fine. The node that had the failure will not
> be included in the source list, distributing the source replication load
> throughout the cluster. However, when a datanode is decommissioning, it will
> be included in the source list with no distinction from other replicas,
> causing it to bear a disproportionate amount of the replication load.
> For example, if every container in the cluster has three replicas and one
> datanode is being decommissioned, the decommissioning node will be the source
> for 33% of the replications, while the other 66% will be distributed
> throughout the cluster based on placement of the other container replicas.
> With datanodes currently throttled at 10 concurrent replication requests,
> this will place continuous load on the decommissioning node (which may
> already be in a bad state hence why it is being removed), while decreasing
> parallelization of the overall replications required.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]