Attila Doroszlai created HDDS-8617:
--------------------------------------

             Summary: Ratis underreplication due to maintenance is not 
deprioritised
                 Key: HDDS-8617
                 URL: https://issues.apache.org/jira/browse/HDDS-8617
             Project: Apache Ozone
          Issue Type: Sub-task
          Components: SCM
    Affects Versions: 1.4.0
            Reporter: Attila Doroszlai


According to the following javadoc, both decommission and maintenance replicas 
should be deprioritised:

{code:title=https://github.com/apache/ozone/blob/6d9002201e58dc995dc133941acaef2af03cb9d2/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ContainerHealthResult.java#L145-L164}
    /**
     * The weightedRedundancy, is the remaining redundancy + the requeue count.
     * When this value is used for ordering in a priority queue it ensures the
     * priority is reduced each time it is requeued, to prevent it from blocking
     * other containers from being processed.
     * Additionally, so that decommission and maintenance replicas are not
     * ordered ahead of under-replicated replicas, a redundancy of
     * DECOMMISSION_REDUNDANCY is used for the decommission redundancy rather
     * than its real redundancy.
     * @return The weightedRedundancy of this result.
     */
    public int getWeightedRedundancy() {
      int result = requeueCount;
      if (dueToDecommission) {
        result += DECOMMISSION_REDUNDANCY;
      } else {
        result += getRemainingRedundancy();
      }
      return result;
    }
{code}

but {{dueToDecommission=true}} is set only based on decommission replicas, 
ignoring maintenance replicas ({{maintenanceCount}}):

{code:title=https://github.com/apache/ozone/blob/6d9002201e58dc995dc133941acaef2af03cb9d2/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/RatisContainerReplicaCount.java#L520-L533}
  /**
   * Checks whether insufficient replication is because of some replicas
   * being on datanodes that were decommissioned.
   * @param includePendingAdd if pending adds should be considered
   * @return true if there is insufficient replication and it's because of
   * decommissioning.
   */
  public boolean inSufficientDueToDecommission(boolean includePendingAdd) {
    if (isSufficientlyReplicated(includePendingAdd)) {
      return false;
    }
    int delta = redundancyDelta(true, includePendingAdd);
    return decommissionCount >= delta;
  }
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to