[GitHub] [ozone] sodonnel commented on a change in pull request #3147: HDDS-6384. EC: Ensure EC container usage is updated correctly when handling reports

GitBox Thu, 03 Mar 2022 05:21:36 -0800


sodonnel commented on a change in pull request #3147:
URL: https://github.com/apache/ozone/pull/3147#discussion_r818648700




##########
File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/AbstractContainerReportHandler.java
##########
@@ -121,30 +122,77 @@ private void updateContainerStats(final DatanodeDetails 
datanodeDetails,
         containerInfo.updateSequenceId(
             replicaProto.getBlockCommitSequenceId());
       }
+      if (containerInfo.getReplicationConfig().getReplicationType()
+          == HddsProtos.ReplicationType.EC) {
+        updateECContainerStats(containerInfo, replicaProto, datanodeDetails);
+      } else {
+        updateRatisContainerStats(containerInfo, replicaProto, 
datanodeDetails);
+      }
+    }
+  }
+
+  private void updateRatisContainerStats(ContainerInfo containerInfo,
+      ContainerReplicaProto newReplica, DatanodeDetails newSource)
+      throws ContainerNotFoundException {
+    List<ContainerReplica> otherReplicas =
+        getOtherReplicas(containerInfo.containerID(), newSource);
+    long usedBytes = newReplica.getUsed();
+    long keyCount = newReplica.getKeyCount();
+    for (ContainerReplica r : otherReplicas) {
+      usedBytes = calculateUsage(containerInfo, usedBytes, r.getBytesUsed());
+      keyCount = calculateUsage(containerInfo, keyCount, r.getKeyCount());
+    }
+    updateContainerUsedAndKeys(containerInfo, usedBytes, keyCount);
+  }
+
+  private void updateECContainerStats(ContainerInfo containerInfo,
+      ContainerReplicaProto newReplica, DatanodeDetails newSource)
+      throws ContainerNotFoundException {
+    int dataNum =
+        ((ECReplicationConfig)containerInfo.getReplicationConfig()).getData();
+    // The first EC index and the parity indexes must all be the same size
+    // while the other data indexes may be smaller due to partial stripes.
+    // When calculating the stats, we only use the first data and parity and
+    // ignore the others. We only need to run the check if we are processing

Review comment:
       Thinking about this more, I am stuck between two choices.
   
   As this PR stands, we will track the size of the largest container in the 
group, and then we can trigger the close of the container when that largest 
container tends toward the set container size limit (5GB). The DN will also 
trigger a close on the same large container when it reaches 5GB, as it does not 
know about the others.
   
   When checking for size in the container, we would have to divide the 
requested size by the dataNum to get the approx value needing to go into the 
largest container.
   
   On the other hand, we can track the total space used in the container group 
by summing the side of all the data containers. This means the container size 
will grow to approx 5GB * dataNum. When deciding if the container has space for 
a new block, we use dataNum * 5GB as the limit. The behaviour on the DNs is 
unchanged.
   
   It feels like it might be useful to track both the total data space used and 
the "largest container in group size", as that may be useful for some reporting 
at some point, but we have not real use for both of the numbers at the moment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [ozone] sodonnel commented on a change in pull request #3147: HDDS-6384. EC: Ensure EC container usage is updated correctly when handling reports

Reply via email to