[ 
https://issues.apache.org/jira/browse/HDDS-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599171#comment-17599171
 ] 

Ethan Rose commented on HDDS-7198:
----------------------------------

One solution is to have SCM provide the order of replication sources to the 
datanodes. SCM could shuffle the list of in service replicas and place the 
decommissioning replica last in the list. Datanodes would iterate the list 
provided to them by SCM to determine the order to try to replicate from. With 
this approach datanodes do exactly as the SCM tells them since it is the master 
service for HDDS. This is also extensible to more advanced replication control 
in the future, where SCM can order source replicas based on in flight 
replications to load balance throughout the cluster.

> Datanodes should avoid using decommissioning nodes as a container replication 
> source
> ------------------------------------------------------------------------------------
>
>                 Key: HDDS-7198
>                 URL: https://issues.apache.org/jira/browse/HDDS-7198
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone Datanode, SCM
>            Reporter: Ethan Rose
>            Priority: Major
>
> Currently when SCM tells a target datanode to replicate a container, it sends 
> the target datanode an ordered list of source datanodes it should download 
> the container from. The target then shuffles the list and tries to download 
> from the sources in the resulting order one by one until one of them succeeds.
> In failure scenarios this works fine. The node that had the failure will not 
> be included in the source list, distributing the source replication load 
> throughout the cluster. However, when a datanode is decommissioning, it will 
> be included in the source list with no distinction from other replicas, 
> causing it to bear a disproportionate amount of the replication load.
> For example, if every container in the cluster has three replicas and one 
> datanode is being decommissioned, the decommissioning node will be the source 
> for 33% of the replications, while the other 66% will be distributed 
> throughout the cluster based on placement of the other container replicas. 
> With datanodes currently throttled at 10 concurrent replication requests, 
> this will place continuous load on the decommissioning node (which may 
> already be in a bad state hence why it is being removed), while decreasing 
> parallelization of the overall replications required.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to