[ 
https://issues.apache.org/jira/browse/HDDS-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599707#comment-17599707
 ] 

Stephen O'Donnell commented on HDDS-7198:
-----------------------------------------

For EC, things will be worse. We have opted to do container copy for EC, so 
there will only ever be the decommission node as the source.

> SCM could shuffle the list of in service replicas and place the 
> decommissioning replica last in the list.

A decommissioning node will have zero write load. Perhaps we could return it 
last for normal reads to alleviate that load on it too. However we don't want 
to de-prioritize the decommissioning nodes completely - if they are not serving 
writes and potentially not reads, they will be otherwise idle.

> Datanodes should avoid using decommissioning nodes as a container replication 
> source
> ------------------------------------------------------------------------------------
>
>                 Key: HDDS-7198
>                 URL: https://issues.apache.org/jira/browse/HDDS-7198
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone Datanode, SCM
>            Reporter: Ethan Rose
>            Priority: Major
>
> Currently when SCM tells a target datanode to replicate a container, it sends 
> the target datanode an ordered list of source datanodes it should download 
> the container from. The target then shuffles the list and tries to download 
> from the sources in the resulting order one by one until one of them succeeds.
> In failure scenarios this works fine. The node that had the failure will not 
> be included in the source list, distributing the source replication load 
> throughout the cluster. However, when a datanode is decommissioning, it will 
> be included in the source list with no distinction from other replicas, 
> causing it to bear a disproportionate amount of the replication load.
> For example, if every container in the cluster has three replicas and one 
> datanode is being decommissioned, the decommissioning node will be the source 
> for 33% of the replications, while the other 66% will be distributed 
> throughout the cluster based on placement of the other container replicas. 
> With datanodes currently throttled at 10 concurrent replication requests, 
> this will place continuous load on the decommissioning node (which may 
> already be in a bad state hence why it is being removed), while decreasing 
> parallelization of the overall replications required.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to