sodonnel commented on a change in pull request #3147:
URL: https://github.com/apache/ozone/pull/3147#discussion_r818648700
##########
File path:
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/AbstractContainerReportHandler.java
##########
@@ -121,30 +122,77 @@ private void updateContainerStats(final DatanodeDetails
datanodeDetails,
containerInfo.updateSequenceId(
replicaProto.getBlockCommitSequenceId());
}
+ if (containerInfo.getReplicationConfig().getReplicationType()
+ == HddsProtos.ReplicationType.EC) {
+ updateECContainerStats(containerInfo, replicaProto, datanodeDetails);
+ } else {
+ updateRatisContainerStats(containerInfo, replicaProto,
datanodeDetails);
+ }
+ }
+ }
+
+ private void updateRatisContainerStats(ContainerInfo containerInfo,
+ ContainerReplicaProto newReplica, DatanodeDetails newSource)
+ throws ContainerNotFoundException {
+ List<ContainerReplica> otherReplicas =
+ getOtherReplicas(containerInfo.containerID(), newSource);
+ long usedBytes = newReplica.getUsed();
+ long keyCount = newReplica.getKeyCount();
+ for (ContainerReplica r : otherReplicas) {
+ usedBytes = calculateUsage(containerInfo, usedBytes, r.getBytesUsed());
+ keyCount = calculateUsage(containerInfo, keyCount, r.getKeyCount());
+ }
+ updateContainerUsedAndKeys(containerInfo, usedBytes, keyCount);
+ }
+
+ private void updateECContainerStats(ContainerInfo containerInfo,
+ ContainerReplicaProto newReplica, DatanodeDetails newSource)
+ throws ContainerNotFoundException {
+ int dataNum =
+ ((ECReplicationConfig)containerInfo.getReplicationConfig()).getData();
+ // The first EC index and the parity indexes must all be the same size
+ // while the other data indexes may be smaller due to partial stripes.
+ // When calculating the stats, we only use the first data and parity and
+ // ignore the others. We only need to run the check if we are processing
Review comment:
Thinking about this more, I am stuck between two choices.
As this PR stands, we will track the size of the largest container in the
group, and then we can trigger the close of the container when that largest
container tends toward the set container size limit (5GB). The DN will also
trigger a close on the same large container when it reaches 5GB, as it does not
know about the others.
When checking for size in the container, we would have to divide the
requested size by the dataNum to get the approx value needing to go into the
largest container.
On the other hand, we can track the total space used in the container group
by summing the side of all the data containers. This means the container size
will grow to approx 5GB * dataNum. When deciding if the container has space for
a new block, we use dataNum * 5GB as the limit. The behaviour on the DNs is
unchanged.
It feels like it might be useful to track both the total data space used and
the "largest container in group size", as that may be useful for some reporting
at some point, but we have not real use for both of the numbers at the moment.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]