Attila Doroszlai created HDDS-8617:
--------------------------------------
Summary: Ratis underreplication due to maintenance is not
deprioritised
Key: HDDS-8617
URL: https://issues.apache.org/jira/browse/HDDS-8617
Project: Apache Ozone
Issue Type: Sub-task
Components: SCM
Affects Versions: 1.4.0
Reporter: Attila Doroszlai
According to the following javadoc, both decommission and maintenance replicas
should be deprioritised:
{code:title=https://github.com/apache/ozone/blob/6d9002201e58dc995dc133941acaef2af03cb9d2/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ContainerHealthResult.java#L145-L164}
/**
* The weightedRedundancy, is the remaining redundancy + the requeue count.
* When this value is used for ordering in a priority queue it ensures the
* priority is reduced each time it is requeued, to prevent it from blocking
* other containers from being processed.
* Additionally, so that decommission and maintenance replicas are not
* ordered ahead of under-replicated replicas, a redundancy of
* DECOMMISSION_REDUNDANCY is used for the decommission redundancy rather
* than its real redundancy.
* @return The weightedRedundancy of this result.
*/
public int getWeightedRedundancy() {
int result = requeueCount;
if (dueToDecommission) {
result += DECOMMISSION_REDUNDANCY;
} else {
result += getRemainingRedundancy();
}
return result;
}
{code}
but {{dueToDecommission=true}} is set only based on decommission replicas,
ignoring maintenance replicas ({{maintenanceCount}}):
{code:title=https://github.com/apache/ozone/blob/6d9002201e58dc995dc133941acaef2af03cb9d2/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/RatisContainerReplicaCount.java#L520-L533}
/**
* Checks whether insufficient replication is because of some replicas
* being on datanodes that were decommissioned.
* @param includePendingAdd if pending adds should be considered
* @return true if there is insufficient replication and it's because of
* decommissioning.
*/
public boolean inSufficientDueToDecommission(boolean includePendingAdd) {
if (isSufficientlyReplicated(includePendingAdd)) {
return false;
}
int delta = redundancyDelta(true, includePendingAdd);
return decommissionCount >= delta;
}
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]