errose28 commented on code in PR #4655:
URL: https://github.com/apache/ozone/pull/4655#discussion_r1195815737


##########
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerInfo.java:
##########
@@ -60,6 +60,7 @@ public class ContainerInfo implements 
Comparator<ContainerInfo>,
   */
   private volatile long usedBytes;
   private long numberOfKeys;
+  private boolean isAllReplicaEmpty;

Review Comment:
   This information can be learned from the replicas. I don't think we should 
persist it with the main container object. What happens if a non-empty replica 
shows up after this flag has been set?



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerData.java:
##########
@@ -537,6 +540,18 @@ public long getBlockCount() {
     return this.blockCount.get();
   }
 
+  public boolean isEmpty() {
+    return isEmpty;
+  }

Review Comment:
   HDDS-8142 also added an `isEmpty` method to the Container class itself. Can 
we reconcile these two methods to check for the flag being set and the presence 
of any blocks?



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/AbstractContainerReportHandler.java:
##########
@@ -370,6 +384,26 @@ private void updateContainerReplica(final DatanodeDetails 
datanodeDetails,
     }
   }
 
+  /**
+   * Determines whether container replica is empty or not.
+   *
+   * @param replicaProto container replica details.
+   * @return true if container replica is empty else false
+   */
+  private boolean fillEmpty(final ContainerReplicaProto replicaProto) {
+    if (replicaProto.hasIsEmpty()) {
+      return replicaProto.getIsEmpty();
+    } else {
+      // Handled when DN version is old then there will not be isEmpty field.
+      // In this case judge container empty based on keyCount
+      // and bytesUsed field.
+      if (replicaProto.getKeyCount() == 0 && replicaProto.getUsed() == 0) {
+        return true;
+      }

Review Comment:
   This value should be populated on DN startup when it is loading the 
containers into memory. There should not be a case where this value is unset 
since we do not support rolling upgrades with mixed versions yet.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/KeyValueContainerUtil.java:
##########
@@ -392,6 +397,24 @@ private static void 
initializeUsedBytesAndBlockCount(DatanodeStore store,
     kvData.setBlockCount(blockCount);
   }
 
+  /**
+   * A container is empty if:
+   * - The container is closed
+   * - There are no blocks in its block table.
+   *
+   * Empty containers are eligible for deletion.
+   */
+  public static boolean isEmpty(DatanodeStore store,

Review Comment:
   This is yet a third `isEmpty` method. We don't want any errors when checking 
if a container is empty, so there should be one clear method to call that 
handles all the checks.



##########
hadoop-hdds/interface-client/src/main/proto/hdds.proto:
##########
@@ -237,6 +237,7 @@ message ContainerInfoProto {
     optional ReplicationFactor replicationFactor  = 10;
     required ReplicationType replicationType  = 11;
     optional ECReplicationConfig ecReplicationConfig = 12;
+    optional bool isAllReplicaEmpty = 13 [default = true];

Review Comment:
   Similar comment as above, I don't think we should persist this value as part 
of the main container state in SCM. And it *definitely* should not default to 
true.



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/health/EmptyContainerHandler.java:
##########
@@ -88,9 +88,9 @@ public boolean handle(ContainerCheckRequest request) {
   private boolean isContainerEmptyAndClosed(final ContainerInfo container,
       final Set<ContainerReplica> replicas) {
     return container.getState() == HddsProtos.LifeCycleState.CLOSED &&
-        container.getNumberOfKeys() == 0 && replicas.stream()

Review Comment:
   > Do we need to have an isEmpty flag on the containerInfo. This would be set 
to true when all the replicas report as empty.
   
   I don't think we should have this. We should only track each piece of 
information in one place. We track if all the replicas are empty based on the 
`isEmpty` flag of each replica. Now there would be a duplicate value that may 
not match the state of all the current replicas. The safest thing is to check 
the isEmpty flag on all replicas before deleting, making this extra flag 
redundant.
   
   > For example, consider an edge case where some replicas are empty and some 
are non empty because of arbitrary reasons. If the non empty replicas are lost, 
then EmptyContainerHandler will now mark this container as EMPTY on the basis 
of reports from other replicas.
   
   Having another `areAllEmpty` flag on the `ContainerInfo` does not change 
this case, because that flag is still set based on all the replicas that SCM 
sees. It is no different from checking all replicas at the time of container 
delete.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to