kerneltime commented on code in PR #7293:
URL: https://github.com/apache/ozone/pull/7293#discussion_r1846898120
##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java:
##########
@@ -564,7 +565,8 @@ private void createContainerMerkleTree(Container container)
{
merkleTree.addChunks(blockData.getLocalID(), chunkInfos);
}
}
- checksumManager.writeContainerDataTree(containerData, merkleTree);
+ checksumManager.writeContainerDataTree(containerData, captureLatencyNs(
+ checksumManager.getMetrics().getCreateMerkleTreeLatencyNS(),
merkleTree::toProto));
Review Comment:
NIT: `getCreateMerkleTreeLatencyNS` should be `getWriteMerkleTreeLatencyNS`
##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java:
##########
@@ -245,11 +387,49 @@ public static boolean checksumFileExist(Container
container) {
* This class represents the difference between our replica of a container
and a peer's replica of a container.
* It summarizes the operations we need to do to reconcile our replica with
the peer replica it was compared to.
*
- * TODO HDDS-10928
*/
public static class ContainerDiff {
Review Comment:
We can make this a stand-alone class instead of embedding it here.
```suggestion
public static class ContainerDiffReport {
```
##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java:
##########
@@ -79,7 +85,7 @@ public void stop() {
* file remains unchanged.
* Concurrent writes to the same file are coordinated internally.
*/
- public void writeContainerDataTree(ContainerData data, ContainerMerkleTree
tree) throws IOException {
+ public void writeContainerDataTree(ContainerData data,
ContainerProtos.ContainerMerkleTree tree) throws IOException {
Review Comment:
I agree; instead of having `getCreateMerkleTreeLatencyNS` be the caller's
responsibility, it should be calculated here. Calls such as
`createContainerMerkleTree` should have their metric as it involves reading
from DB and iterating blocks, and we should know the time it takes to do that.
##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java:
##########
@@ -143,12 +149,148 @@ public void markBlocksAsDeleted(KeyValueContainerData
data, Collection<Long> del
}
}
- public ContainerDiff diff(KeyValueContainerData thisContainer,
ContainerProtos.ContainerChecksumInfo otherInfo)
- throws IOException {
- // TODO HDDS-10928 compare the checksum info of the two containers and
return a summary.
- // Callers can act on this summary to repair their container replica
using the peer's replica.
- // This method will use the read lock, which is unused in the current
implementation.
- return new ContainerDiff();
+ public ContainerDiff diff(KeyValueContainerData thisContainer,
ContainerProtos.ContainerChecksumInfo peerChecksumInfo)
+ throws Exception {
+ ContainerDiff report = new ContainerDiff();
+ try {
+ captureLatencyNs(metrics.getMerkleTreeDiffLatencyNS(), () -> {
+ Preconditions.assertNotNull(thisContainer, "Container data is null");
+ Preconditions.assertNotNull(peerChecksumInfo, "Peer checksum info is
null");
+ Optional<ContainerProtos.ContainerChecksumInfo.Builder>
thisContainerChecksumInfoBuilder =
+ read(thisContainer);
+ if (!thisContainerChecksumInfoBuilder.isPresent()) {
+ throw new IOException("The container #" +
thisContainer.getContainerID() +
+ " doesn't have container checksum");
+ }
+
+ if (thisContainer.getContainerID() !=
peerChecksumInfo.getContainerID()) {
+ throw new IOException("Container Id does not match for container " +
thisContainer.getContainerID());
Review Comment:
Should we preserve the fidelity of the two different scenarios in which this
method raises an exception? If the file does not have any content. Also. `read`
can throw an exception. The container ID not matching is a more catastrophic
error as that would be a coding bug vs issues with filesystem read.
##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java:
##########
@@ -143,12 +149,148 @@ public void markBlocksAsDeleted(KeyValueContainerData
data, Collection<Long> del
}
}
- public ContainerDiff diff(KeyValueContainerData thisContainer,
ContainerProtos.ContainerChecksumInfo otherInfo)
- throws IOException {
- // TODO HDDS-10928 compare the checksum info of the two containers and
return a summary.
- // Callers can act on this summary to repair their container replica
using the peer's replica.
- // This method will use the read lock, which is unused in the current
implementation.
- return new ContainerDiff();
+ public ContainerDiff diff(KeyValueContainerData thisContainer,
ContainerProtos.ContainerChecksumInfo peerChecksumInfo)
+ throws Exception {
+ ContainerDiff report = new ContainerDiff();
+ try {
+ captureLatencyNs(metrics.getMerkleTreeDiffLatencyNS(), () -> {
+ Preconditions.assertNotNull(thisContainer, "Container data is null");
+ Preconditions.assertNotNull(peerChecksumInfo, "Peer checksum info is
null");
+ Optional<ContainerProtos.ContainerChecksumInfo.Builder>
thisContainerChecksumInfoBuilder =
+ read(thisContainer);
+ if (!thisContainerChecksumInfoBuilder.isPresent()) {
+ throw new IOException("The container #" +
thisContainer.getContainerID() +
+ " doesn't have container checksum");
+ }
+
+ if (thisContainer.getContainerID() !=
peerChecksumInfo.getContainerID()) {
+ throw new IOException("Container Id does not match for container " +
thisContainer.getContainerID());
+ }
+
+ ContainerProtos.ContainerChecksumInfo thisChecksumInfo =
thisContainerChecksumInfoBuilder.get().build();
+ compareContainerMerkleTree(thisChecksumInfo, peerChecksumInfo, report);
+ });
+ } catch (Exception ex) {
+ metrics.incrementMerkleTreeDiffFailures();
+ throw new Exception("Container Diff failed for container #" +
thisContainer.getContainerID(), ex);
+ }
+
+ // Update Container Diff metrics based on the diff report.
+ if (report.needsRepair()) {
+ metrics.incrementRepairContainerDiffs();
+ } else {
+ metrics.incrementNoRepairContainerDiffs();
+ }
+ metrics.incrementMerkleTreeDiffSuccesses();
+ return report;
+ }
Review Comment:
Do we need ` metrics.incrementMerkleTreeDiffSuccesses();`
```suggestion
// Update Container Diff metrics based on the diff report.
if (report.needsRepair()) {
metrics.incrementRepairContainerDiffs();
return report;
}
metrics.incrementNoRepairContainerDiffs();
return report;
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]