Re: [PR] HDDS-10928. Implement container comparison and repair logic within datanodes. [ozone]

via GitHub Wed, 16 Oct 2024 10:31:33 -0700


errose28 commented on code in PR #7293:
URL: https://github.com/apache/ozone/pull/7293#discussion_r1803456094



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java:
##########
@@ -143,12 +148,81 @@ public void markBlocksAsDeleted(KeyValueContainerData 
data, Collection<Long> del
     }
   }
 
-  public ContainerDiff diff(KeyValueContainerData thisContainer, 
ContainerProtos.ContainerChecksumInfo otherInfo)
+  public ContainerDiff diff(KeyValueContainerData thisContainer, 
ContainerProtos.ContainerChecksumInfo peerChecksumInfo)
       throws IOException {
-    // TODO HDDS-10928 compare the checksum info of the two containers and 
return a summary.
-    //  Callers can act on this summary to repair their container replica 
using the peer's replica.
-    //  This method will use the read lock, which is unused in the current 
implementation.
-    return new ContainerDiff();
+    Preconditions.assertNotNull(thisContainer, "Container data is null");
+    Optional<ContainerProtos.ContainerChecksumInfo.Builder> 
thisContainerChecksumInfoBuilder =
+        read(thisContainer);
+    if (!thisContainerChecksumInfoBuilder.isPresent()) {
+      // TODO: To create containerMerkleTree or fail the request.

Review Comment:
   IMO we should fail the request for the caller and queue the container for 
on-demand scanning.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java:
##########
@@ -143,12 +148,81 @@ public void markBlocksAsDeleted(KeyValueContainerData 
data, Collection<Long> del
     }
   }
 
-  public ContainerDiff diff(KeyValueContainerData thisContainer, 
ContainerProtos.ContainerChecksumInfo otherInfo)
+  public ContainerDiff diff(KeyValueContainerData thisContainer, 
ContainerProtos.ContainerChecksumInfo peerChecksumInfo)
       throws IOException {
-    // TODO HDDS-10928 compare the checksum info of the two containers and 
return a summary.
-    //  Callers can act on this summary to repair their container replica 
using the peer's replica.
-    //  This method will use the read lock, which is unused in the current 
implementation.
-    return new ContainerDiff();
+    Preconditions.assertNotNull(thisContainer, "Container data is null");
+    Optional<ContainerProtos.ContainerChecksumInfo.Builder> 
thisContainerChecksumInfoBuilder =
+        read(thisContainer);
+    if (!thisContainerChecksumInfoBuilder.isPresent()) {
+      // TODO: To create containerMerkleTree or fail the request.
+      return null;
+    }
+
+    if (thisContainer.getContainerID() != peerChecksumInfo.getContainerID()) {
+      throw new IOException("Container Id does not match");
+    }
+
+    ContainerProtos.ContainerChecksumInfo thisChecksumInfo = 
thisContainerChecksumInfoBuilder.get().build();
+
+    ContainerProtos.ContainerMerkleTree thisMerkleTree = 
thisChecksumInfo.getContainerMerkleTree();
+    ContainerProtos.ContainerMerkleTree peerMerkleTree = 
peerChecksumInfo.getContainerMerkleTree();
+
+    return compareContainerMerkleTree(thisMerkleTree, peerMerkleTree);

Review Comment:
   Each tree diff should have a one line info log summarizing what happened. 
For example, whether containers matched and how many of each type of difference 
were found. This info can come from a `toString` implementation in the 
`ContainerDiff` and we can just log that object. We might want to add this to 
the caller of this method though, since they will have the peer information. I 
think the current plan is that will happen in a follow-up PR.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java:
##########
@@ -143,12 +148,81 @@ public void markBlocksAsDeleted(KeyValueContainerData 
data, Collection<Long> del
     }
   }
 
-  public ContainerDiff diff(KeyValueContainerData thisContainer, 
ContainerProtos.ContainerChecksumInfo otherInfo)
+  public ContainerDiff diff(KeyValueContainerData thisContainer, 
ContainerProtos.ContainerChecksumInfo peerChecksumInfo)
       throws IOException {
-    // TODO HDDS-10928 compare the checksum info of the two containers and 
return a summary.
-    //  Callers can act on this summary to repair their container replica 
using the peer's replica.
-    //  This method will use the read lock, which is unused in the current 
implementation.
-    return new ContainerDiff();
+    Preconditions.assertNotNull(thisContainer, "Container data is null");
+    Optional<ContainerProtos.ContainerChecksumInfo.Builder> 
thisContainerChecksumInfoBuilder =
+        read(thisContainer);
+    if (!thisContainerChecksumInfoBuilder.isPresent()) {
+      // TODO: To create containerMerkleTree or fail the request.
+      return null;
+    }
+
+    if (thisContainer.getContainerID() != peerChecksumInfo.getContainerID()) {
+      throw new IOException("Container Id does not match");
+    }
+
+    ContainerProtos.ContainerChecksumInfo thisChecksumInfo = 
thisContainerChecksumInfoBuilder.get().build();
+
+    ContainerProtos.ContainerMerkleTree thisMerkleTree = 
thisChecksumInfo.getContainerMerkleTree();
+    ContainerProtos.ContainerMerkleTree peerMerkleTree = 
peerChecksumInfo.getContainerMerkleTree();
+
+    return compareContainerMerkleTree(thisMerkleTree, peerMerkleTree);
+  }
+
+  private ContainerDiff 
compareContainerMerkleTree(ContainerProtos.ContainerMerkleTree thisMerkleTree,
+                                                   
ContainerProtos.ContainerMerkleTree peerMerkleTree) {
+
+    ContainerDiff report = new ContainerDiff();

Review Comment:
   nit. We can intern the healthy/empty `ContainerDiff` object since they will 
all be identical.



##########
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/checksum/TestContainerChecksumTreeManager.java:
##########
@@ -299,6 +301,86 @@ public void testEmptyFile() throws Exception {
     assertEquals(CONTAINER_ID, info.getContainerID());
   }
 
+  @Test
+  public void testContainerWithNoDiff() throws IOException {
+    ContainerMerkleTree ourMerkleTree = buildTestTree(config);
+    ContainerMerkleTree peerMerkleTree = buildTestTree(config);
+    checksumManager.writeContainerDataTree(container, ourMerkleTree);
+    ContainerChecksumTreeManager containerChecksumTreeManager = new 
ContainerChecksumTreeManager(
+        new OzoneConfiguration());
+    ContainerProtos.ContainerChecksumInfo peerChecksumInfo = 
ContainerProtos.ContainerChecksumInfo.newBuilder()
+            .setContainerID(container.getContainerID())
+            .setContainerMerkleTree(peerMerkleTree.toProto()).build();
+    ContainerChecksumTreeManager.ContainerDiff diff =
+        containerChecksumTreeManager.diff(container, peerChecksumInfo);
+    Assertions.assertTrue(diff.getCorruptChunks().isEmpty());
+    Assertions.assertTrue(diff.getMissingBlocks().isEmpty());
+    Assertions.assertTrue(diff.getMissingChunks().isEmpty());

Review Comment:
   We can wrap these in one method to quickly check if we need to drill down 
into the diff. Something like `ContainerDiff#isHealthy`.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java:
##########
@@ -143,12 +148,81 @@ public void markBlocksAsDeleted(KeyValueContainerData 
data, Collection<Long> del
     }
   }
 
-  public ContainerDiff diff(KeyValueContainerData thisContainer, 
ContainerProtos.ContainerChecksumInfo otherInfo)
+  public ContainerDiff diff(KeyValueContainerData thisContainer, 
ContainerProtos.ContainerChecksumInfo peerChecksumInfo)
       throws IOException {
-    // TODO HDDS-10928 compare the checksum info of the two containers and 
return a summary.
-    //  Callers can act on this summary to repair their container replica 
using the peer's replica.
-    //  This method will use the read lock, which is unused in the current 
implementation.
-    return new ContainerDiff();
+    Preconditions.assertNotNull(thisContainer, "Container data is null");
+    Optional<ContainerProtos.ContainerChecksumInfo.Builder> 
thisContainerChecksumInfoBuilder =
+        read(thisContainer);
+    if (!thisContainerChecksumInfoBuilder.isPresent()) {
+      // TODO: To create containerMerkleTree or fail the request.
+      return null;
+    }
+
+    if (thisContainer.getContainerID() != peerChecksumInfo.getContainerID()) {
+      throw new IOException("Container Id does not match");
+    }
+
+    ContainerProtos.ContainerChecksumInfo thisChecksumInfo = 
thisContainerChecksumInfoBuilder.get().build();
+
+    ContainerProtos.ContainerMerkleTree thisMerkleTree = 
thisChecksumInfo.getContainerMerkleTree();
+    ContainerProtos.ContainerMerkleTree peerMerkleTree = 
peerChecksumInfo.getContainerMerkleTree();
+
+    return compareContainerMerkleTree(thisMerkleTree, peerMerkleTree);
+  }
+
+  private ContainerDiff 
compareContainerMerkleTree(ContainerProtos.ContainerMerkleTree thisMerkleTree,
+                                                   
ContainerProtos.ContainerMerkleTree peerMerkleTree) {
+
+    ContainerDiff report = new ContainerDiff();
+    if (thisMerkleTree.getDataChecksum() == peerMerkleTree.getDataChecksum()) {
+      return new ContainerDiff();
+    }
+
+    Map<Long, ContainerProtos.BlockMerkleTree> thisBlockMerkleTreeMap = 
thisMerkleTree.getBlockMerkleTreeList()
+        
.stream().collect(Collectors.toMap(ContainerProtos.BlockMerkleTree::getBlockID,
+            blockMerkleTree -> blockMerkleTree));
+
+    // Since we are reconciling our container with a peer, We only need to go 
through the peer's block list
+    for (ContainerProtos.BlockMerkleTree peerBlockMerkleTree: 
peerMerkleTree.getBlockMerkleTreeList()) {
+      // Check if our container has the peer block.
+      ContainerProtos.BlockMerkleTree thisBlockMerkleTree = 
thisBlockMerkleTreeMap.get(
+          peerBlockMerkleTree.getBlockID());
+      if (thisBlockMerkleTree == null) {
+        report.addMissingBlock(peerBlockMerkleTree);
+        continue;
+      }
+
+      if (thisBlockMerkleTree.getBlockChecksum() != 
peerBlockMerkleTree.getBlockChecksum()) {
+        compareBlockMerkleTree(report, thisBlockMerkleTree, 
peerBlockMerkleTree);
+      }
+    }
+    return report;
+  }
+
+  private void compareBlockMerkleTree(ContainerDiff report, 
ContainerProtos.BlockMerkleTree thisBlockMerkleTree,
+                                      ContainerProtos.BlockMerkleTree 
peerBlockMerkleTree) {
+    Map<Long, ContainerProtos.ChunkMerkleTree> thisChunkMerkleTreeMap = 
thisBlockMerkleTree.getChunkMerkleTreeList()
+        
.stream().collect(Collectors.toMap(ContainerProtos.ChunkMerkleTree::getOffset,
+            chunkMerkleTree -> chunkMerkleTree));
+    List<ContainerProtos.ChunkMerkleTree> peerChunkMerkleTreeList = 
peerBlockMerkleTree.getChunkMerkleTreeList();
+
+    // Since we are reconciling our container with a peer, We only need to go 
through the peer's chunk list
+    for (ContainerProtos.ChunkMerkleTree peerChunkMerkleTree: 
peerChunkMerkleTreeList) {
+      ContainerProtos.ChunkMerkleTree thisChunkMerkleTree = 
thisChunkMerkleTreeMap.get(
+          peerChunkMerkleTree.getOffset());
+      if (thisChunkMerkleTree == null) {
+        report.addMissingChunk(peerChunkMerkleTree);
+        continue;
+      }
+
+      if (thisChunkMerkleTree.getChunkChecksum() != 
peerChunkMerkleTree.getChunkChecksum()) {
+        ChunkArgs thisChunkArgs = new 
ChunkArgs(thisBlockMerkleTree.getBlockID(), thisChunkMerkleTree.getOffset(),
+            thisChunkMerkleTree.getLength(), 
thisChunkMerkleTree.getChunkChecksum());
+        ChunkArgs peerChunkArgs = new 
ChunkArgs(peerBlockMerkleTree.getBlockID(), peerChunkMerkleTree.getOffset(),
+            peerChunkMerkleTree.getLength(), 
peerChunkMerkleTree.getChunkChecksum());
+        report.addCorruptChunks(Pair.of(thisChunkArgs, peerChunkArgs));

Review Comment:
   Why do we need both our and the peer's chunks? The most granular repair we 
are doing is at the chunk level, so we should be able to read the peer's whole 
chunk from just the block ID and chunk merkle tree.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java:
##########
@@ -143,12 +148,81 @@ public void markBlocksAsDeleted(KeyValueContainerData 
data, Collection<Long> del
     }
   }
 
-  public ContainerDiff diff(KeyValueContainerData thisContainer, 
ContainerProtos.ContainerChecksumInfo otherInfo)
+  public ContainerDiff diff(KeyValueContainerData thisContainer, 
ContainerProtos.ContainerChecksumInfo peerChecksumInfo)

Review Comment:
   Can we add diff time to `ContainerMerkleTreeMetrics` as well?



##########
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/checksum/TestContainerChecksumTreeManager.java:
##########
@@ -299,6 +301,86 @@ public void testEmptyFile() throws Exception {
     assertEquals(CONTAINER_ID, info.getContainerID());
   }
 
+  @Test
+  public void testContainerWithNoDiff() throws IOException {
+    ContainerMerkleTree ourMerkleTree = buildTestTree(config);
+    ContainerMerkleTree peerMerkleTree = buildTestTree(config);
+    checksumManager.writeContainerDataTree(container, ourMerkleTree);
+    ContainerChecksumTreeManager containerChecksumTreeManager = new 
ContainerChecksumTreeManager(
+        new OzoneConfiguration());
+    ContainerProtos.ContainerChecksumInfo peerChecksumInfo = 
ContainerProtos.ContainerChecksumInfo.newBuilder()
+            .setContainerID(container.getContainerID())
+            .setContainerMerkleTree(peerMerkleTree.toProto()).build();
+    ContainerChecksumTreeManager.ContainerDiff diff =
+        containerChecksumTreeManager.diff(container, peerChecksumInfo);
+    Assertions.assertTrue(diff.getCorruptChunks().isEmpty());
+    Assertions.assertTrue(diff.getMissingBlocks().isEmpty());
+    Assertions.assertTrue(diff.getMissingChunks().isEmpty());
+  }
+
+  @Test
+  public void testContainerDiffWithMissingBlocksAndChunks() throws IOException 
{
+    ContainerMerkleTree ourMerkleTree = buildTestTreeWithMismatches(config, 
true, true, false);
+    ContainerMerkleTree peerMerkleTree = buildTestTree(config);
+    checksumManager.writeContainerDataTree(container, ourMerkleTree);
+    ContainerChecksumTreeManager containerChecksumTreeManager = new 
ContainerChecksumTreeManager(
+        new OzoneConfiguration());
+    ContainerProtos.ContainerChecksumInfo peerChecksumInfo = 
ContainerProtos.ContainerChecksumInfo.newBuilder()
+        .setContainerID(container.getContainerID())
+        .setContainerMerkleTree(peerMerkleTree.toProto()).build();
+    ContainerChecksumTreeManager.ContainerDiff diff =
+        containerChecksumTreeManager.diff(container, peerChecksumInfo);
+    Assertions.assertTrue(diff.getCorruptChunks().isEmpty());
+    Assertions.assertFalse(diff.getMissingBlocks().isEmpty());
+    Assertions.assertFalse(diff.getMissingChunks().isEmpty());
+
+    Assertions.assertEquals(diff.getMissingBlocks().size(), 1);
+    Assertions.assertEquals(diff.getMissingChunks().size(), 1);
+  }
+
+  @Test
+  public void testContainerDiffWithMissingBlocksAndMismatchChunks() throws 
IOException {
+    ContainerMerkleTree ourMerkleTree = buildTestTreeWithMismatches(config, 
true, false, true);
+    ContainerMerkleTree peerMerkleTree = buildTestTree(config);
+    checksumManager.writeContainerDataTree(container, ourMerkleTree);
+    ContainerChecksumTreeManager containerChecksumTreeManager = new 
ContainerChecksumTreeManager(
+        new OzoneConfiguration());
+    ContainerProtos.ContainerChecksumInfo peerChecksumInfo = 
ContainerProtos.ContainerChecksumInfo.newBuilder()
+        .setContainerID(container.getContainerID())
+        .setContainerMerkleTree(peerMerkleTree.toProto()).build();
+    ContainerChecksumTreeManager.ContainerDiff diff =
+        containerChecksumTreeManager.diff(container, peerChecksumInfo);
+    Assertions.assertFalse(diff.getCorruptChunks().isEmpty());
+    Assertions.assertFalse(diff.getMissingBlocks().isEmpty());
+    Assertions.assertTrue(diff.getMissingChunks().isEmpty());
+
+    Assertions.assertEquals(diff.getCorruptChunks().size(), 1);
+    Assertions.assertEquals(diff.getMissingBlocks().size(), 1);
+  }
+
+  /**
+   * Test if a peer which has missing blocks and chunks affects our container 
diff.
+   * Only if our merkle tree has missing entries from the peer we need to add 
it the Container Diff.
+   */
+  @Test
+  public void testPeerWithMissingBlockAndMissingChunks() throws IOException {

Review Comment:
   Here's some high level ideas for test improvements:
   - `buildTestTreeWithMismatches` can take counters for the number of each 
type of failure to inject, instead of just a boolean.
   - The test can be parameterized because each input maps to the result of the 
list we check at the end. This way we can quickly check many different 
combinations of diffs.
   - `buildTestTreeWithMismatches` can also return the expected 
`ContainerDiff`. Then add an `equals` method to `ContainerDiff` that checks 
IDs, offsets, etc and just assert that is true at the end of the test.
   - A second test runs the same combinations in reverse, with the failures 
happening in the peer's tree. The difference is that this one always asserts 
our tree is healthy.
   
   I think this would distill most of this into just two small parameterized 
tests. We may need a third case to test when both trees have failures. On top 
of this we should add tests for all exceptional cases.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java:
##########
@@ -143,12 +148,81 @@ public void markBlocksAsDeleted(KeyValueContainerData 
data, Collection<Long> del
     }
   }
 
-  public ContainerDiff diff(KeyValueContainerData thisContainer, 
ContainerProtos.ContainerChecksumInfo otherInfo)
+  public ContainerDiff diff(KeyValueContainerData thisContainer, 
ContainerProtos.ContainerChecksumInfo peerChecksumInfo)
       throws IOException {
-    // TODO HDDS-10928 compare the checksum info of the two containers and 
return a summary.
-    //  Callers can act on this summary to repair their container replica 
using the peer's replica.
-    //  This method will use the read lock, which is unused in the current 
implementation.
-    return new ContainerDiff();
+    Preconditions.assertNotNull(thisContainer, "Container data is null");
+    Optional<ContainerProtos.ContainerChecksumInfo.Builder> 
thisContainerChecksumInfoBuilder =
+        read(thisContainer);
+    if (!thisContainerChecksumInfoBuilder.isPresent()) {
+      // TODO: To create containerMerkleTree or fail the request.
+      return null;
+    }
+
+    if (thisContainer.getContainerID() != peerChecksumInfo.getContainerID()) {
+      throw new IOException("Container Id does not match");
+    }
+
+    ContainerProtos.ContainerChecksumInfo thisChecksumInfo = 
thisContainerChecksumInfoBuilder.get().build();
+
+    ContainerProtos.ContainerMerkleTree thisMerkleTree = 
thisChecksumInfo.getContainerMerkleTree();
+    ContainerProtos.ContainerMerkleTree peerMerkleTree = 
peerChecksumInfo.getContainerMerkleTree();
+
+    return compareContainerMerkleTree(thisMerkleTree, peerMerkleTree);
+  }
+
+  private ContainerDiff 
compareContainerMerkleTree(ContainerProtos.ContainerMerkleTree thisMerkleTree,
+                                                   
ContainerProtos.ContainerMerkleTree peerMerkleTree) {
+
+    ContainerDiff report = new ContainerDiff();
+    if (thisMerkleTree.getDataChecksum() == peerMerkleTree.getDataChecksum()) {
+      return new ContainerDiff();
+    }
+
+    Map<Long, ContainerProtos.BlockMerkleTree> thisBlockMerkleTreeMap = 
thisMerkleTree.getBlockMerkleTreeList()

Review Comment:
   Since both lists are sorted I think we should just walk each one with a 
pointer and compare the block IDs or chunk offsets (whichever is used to sort 
for the current structure):
   1. Start one pointer at the beginning of each list.
   2. If the sorting values match and the checksums match, move both pointers 
forward.
   3. If the sorting values match but the checksums don't match, continue down 
to the chunk level.
   4. If the value of our tree is less, keep moving our pointer forward until 
it is equal or past the peer.
       - These are blocks we have but the peer does not. No action is required 
from our side.
   5. If the value of the peer's tree is less, keep moving the peer pointer 
forward until is equal or past our current value.
       - These are blocks that the peer has but we do not. We should mark them 
as missing.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java:
##########
@@ -143,12 +148,81 @@ public void markBlocksAsDeleted(KeyValueContainerData 
data, Collection<Long> del
     }
   }
 
-  public ContainerDiff diff(KeyValueContainerData thisContainer, 
ContainerProtos.ContainerChecksumInfo otherInfo)
+  public ContainerDiff diff(KeyValueContainerData thisContainer, 
ContainerProtos.ContainerChecksumInfo peerChecksumInfo)
       throws IOException {
-    // TODO HDDS-10928 compare the checksum info of the two containers and 
return a summary.
-    //  Callers can act on this summary to repair their container replica 
using the peer's replica.
-    //  This method will use the read lock, which is unused in the current 
implementation.
-    return new ContainerDiff();
+    Preconditions.assertNotNull(thisContainer, "Container data is null");
+    Optional<ContainerProtos.ContainerChecksumInfo.Builder> 
thisContainerChecksumInfoBuilder =
+        read(thisContainer);
+    if (!thisContainerChecksumInfoBuilder.isPresent()) {
+      // TODO: To create containerMerkleTree or fail the request.
+      return null;
+    }
+
+    if (thisContainer.getContainerID() != peerChecksumInfo.getContainerID()) {
+      throw new IOException("Container Id does not match");
+    }
+
+    ContainerProtos.ContainerChecksumInfo thisChecksumInfo = 
thisContainerChecksumInfoBuilder.get().build();
+
+    ContainerProtos.ContainerMerkleTree thisMerkleTree = 
thisChecksumInfo.getContainerMerkleTree();
+    ContainerProtos.ContainerMerkleTree peerMerkleTree = 
peerChecksumInfo.getContainerMerkleTree();
+
+    return compareContainerMerkleTree(thisMerkleTree, peerMerkleTree);
+  }
+
+  private ContainerDiff 
compareContainerMerkleTree(ContainerProtos.ContainerMerkleTree thisMerkleTree,
+                                                   
ContainerProtos.ContainerMerkleTree peerMerkleTree) {
+
+    ContainerDiff report = new ContainerDiff();
+    if (thisMerkleTree.getDataChecksum() == peerMerkleTree.getDataChecksum()) {
+      return new ContainerDiff();
+    }
+
+    Map<Long, ContainerProtos.BlockMerkleTree> thisBlockMerkleTreeMap = 
thisMerkleTree.getBlockMerkleTreeList()
+        
.stream().collect(Collectors.toMap(ContainerProtos.BlockMerkleTree::getBlockID,
+            blockMerkleTree -> blockMerkleTree));
+
+    // Since we are reconciling our container with a peer, We only need to go 
through the peer's block list
+    for (ContainerProtos.BlockMerkleTree peerBlockMerkleTree: 
peerMerkleTree.getBlockMerkleTreeList()) {
+      // Check if our container has the peer block.
+      ContainerProtos.BlockMerkleTree thisBlockMerkleTree = 
thisBlockMerkleTreeMap.get(
+          peerBlockMerkleTree.getBlockID());
+      if (thisBlockMerkleTree == null) {
+        report.addMissingBlock(peerBlockMerkleTree);
+        continue;
+      }
+
+      if (thisBlockMerkleTree.getBlockChecksum() != 
peerBlockMerkleTree.getBlockChecksum()) {

Review Comment:
   More than just the checksums not matching, we need to know that our checksum 
is incorrect according to the client checksums and the peer's checksum is 
correct. There's two ways to do this:
   - When we get to the bottom of the tree, check all the checksums within the 
chunk to determine which peer is correct.
   - Add an `unhealthy` flag to each node of the tree to indicate a checksum 
mismatch was detected in its subtree. This would be filled in as part of 
HDDS-10374
   
   IMO the second option is better. We can add the `unhealthy` flag here and 
manually set it for unit testing until the scanner sets it automatically.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-10928. Implement container comparison and repair logic within datanodes. [ozone]

Reply via email to