kerneltime commented on code in PR #7293:
URL: https://github.com/apache/ozone/pull/7293#discussion_r1846898120


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java:
##########
@@ -564,7 +565,8 @@ private void createContainerMerkleTree(Container container) 
{
           merkleTree.addChunks(blockData.getLocalID(), chunkInfos);
         }
       }
-      checksumManager.writeContainerDataTree(containerData, merkleTree);
+      checksumManager.writeContainerDataTree(containerData, captureLatencyNs(
+          checksumManager.getMetrics().getCreateMerkleTreeLatencyNS(), 
merkleTree::toProto));

Review Comment:
   NIT: `getCreateMerkleTreeLatencyNS` should be `getWriteMerkleTreeLatencyNS`



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java:
##########
@@ -245,11 +387,49 @@ public static boolean checksumFileExist(Container 
container) {
    * This class represents the difference between our replica of a container 
and a peer's replica of a container.
    * It summarizes the operations we need to do to reconcile our replica with 
the peer replica it was compared to.
    *
-   * TODO HDDS-10928
    */
   public static class ContainerDiff {

Review Comment:
   We can make this a stand-alone class instead of embedding it here. 
   ```suggestion
     public static class ContainerDiffReport {
   ```



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java:
##########
@@ -79,7 +85,7 @@ public void stop() {
    * file remains unchanged.
    * Concurrent writes to the same file are coordinated internally.
    */
-  public void writeContainerDataTree(ContainerData data, ContainerMerkleTree 
tree) throws IOException {
+  public void writeContainerDataTree(ContainerData data, 
ContainerProtos.ContainerMerkleTree tree) throws IOException {

Review Comment:
   I agree; instead of having `getCreateMerkleTreeLatencyNS` be the caller's 
responsibility, it should be calculated here. Calls such as 
`createContainerMerkleTree` should have their metric as it involves reading 
from DB and iterating blocks, and we should know the time it takes to do that. 



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java:
##########
@@ -143,12 +149,148 @@ public void markBlocksAsDeleted(KeyValueContainerData 
data, Collection<Long> del
     }
   }
 
-  public ContainerDiff diff(KeyValueContainerData thisContainer, 
ContainerProtos.ContainerChecksumInfo otherInfo)
-      throws IOException {
-    // TODO HDDS-10928 compare the checksum info of the two containers and 
return a summary.
-    //  Callers can act on this summary to repair their container replica 
using the peer's replica.
-    //  This method will use the read lock, which is unused in the current 
implementation.
-    return new ContainerDiff();
+  public ContainerDiff diff(KeyValueContainerData thisContainer, 
ContainerProtos.ContainerChecksumInfo peerChecksumInfo)
+      throws Exception {
+    ContainerDiff report = new ContainerDiff();
+    try {
+      captureLatencyNs(metrics.getMerkleTreeDiffLatencyNS(), () -> {
+        Preconditions.assertNotNull(thisContainer, "Container data is null");
+        Preconditions.assertNotNull(peerChecksumInfo, "Peer checksum info is 
null");
+        Optional<ContainerProtos.ContainerChecksumInfo.Builder> 
thisContainerChecksumInfoBuilder =
+            read(thisContainer);
+        if (!thisContainerChecksumInfoBuilder.isPresent()) {
+          throw new IOException("The container #" + 
thisContainer.getContainerID() +
+              " doesn't have container checksum");
+        }
+
+        if (thisContainer.getContainerID() != 
peerChecksumInfo.getContainerID()) {
+          throw new IOException("Container Id does not match for container " + 
thisContainer.getContainerID());

Review Comment:
   Should we preserve the fidelity of the two different scenarios in which this 
method raises an exception? If the file does not have any content. Also. `read` 
can throw an exception. The container ID not matching is a more catastrophic 
error as that would be a coding bug vs issues with filesystem read.  



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java:
##########
@@ -143,12 +149,148 @@ public void markBlocksAsDeleted(KeyValueContainerData 
data, Collection<Long> del
     }
   }
 
-  public ContainerDiff diff(KeyValueContainerData thisContainer, 
ContainerProtos.ContainerChecksumInfo otherInfo)
-      throws IOException {
-    // TODO HDDS-10928 compare the checksum info of the two containers and 
return a summary.
-    //  Callers can act on this summary to repair their container replica 
using the peer's replica.
-    //  This method will use the read lock, which is unused in the current 
implementation.
-    return new ContainerDiff();
+  public ContainerDiff diff(KeyValueContainerData thisContainer, 
ContainerProtos.ContainerChecksumInfo peerChecksumInfo)
+      throws Exception {
+    ContainerDiff report = new ContainerDiff();
+    try {
+      captureLatencyNs(metrics.getMerkleTreeDiffLatencyNS(), () -> {
+        Preconditions.assertNotNull(thisContainer, "Container data is null");
+        Preconditions.assertNotNull(peerChecksumInfo, "Peer checksum info is 
null");
+        Optional<ContainerProtos.ContainerChecksumInfo.Builder> 
thisContainerChecksumInfoBuilder =
+            read(thisContainer);
+        if (!thisContainerChecksumInfoBuilder.isPresent()) {
+          throw new IOException("The container #" + 
thisContainer.getContainerID() +
+              " doesn't have container checksum");
+        }
+
+        if (thisContainer.getContainerID() != 
peerChecksumInfo.getContainerID()) {
+          throw new IOException("Container Id does not match for container " + 
thisContainer.getContainerID());
+        }
+
+        ContainerProtos.ContainerChecksumInfo thisChecksumInfo = 
thisContainerChecksumInfoBuilder.get().build();
+        compareContainerMerkleTree(thisChecksumInfo, peerChecksumInfo, report);
+      });
+    } catch (Exception ex) {
+      metrics.incrementMerkleTreeDiffFailures();
+      throw new Exception("Container Diff failed for container #" + 
thisContainer.getContainerID(), ex);
+    }
+
+    // Update Container Diff metrics based on the diff report.
+    if (report.needsRepair()) {
+      metrics.incrementRepairContainerDiffs();
+    } else {
+      metrics.incrementNoRepairContainerDiffs();
+    }
+    metrics.incrementMerkleTreeDiffSuccesses();
+    return report;
+  }

Review Comment:
   Do we need   ` metrics.incrementMerkleTreeDiffSuccesses();` 
   
   ```suggestion
       // Update Container Diff metrics based on the diff report.
       if (report.needsRepair()) {
         metrics.incrementRepairContainerDiffs();
          return report;
       } 
       metrics.incrementNoRepairContainerDiffs();
       return report;
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to