errose28 commented on code in PR #8204:
URL: https://github.com/apache/ozone/pull/8204#discussion_r2027901213
##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java:
##########
@@ -378,6 +380,21 @@ public ByteString
getContainerChecksumInfo(KeyValueContainerData data) throws IO
}
}
+ public static Optional<ContainerProtos.ContainerChecksumInfo>
readChecksumInfo(KeyValueContainerData data) {
Review Comment:
In #7474 the `read` method is made public. That method is non-static but the
only field it uses is the metrics. Can we make the public `read` call this
method but wrap it in the metrics update to reduce duplication?
##########
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/checksum/TestContainerCommandReconciliation.java:
##########
@@ -247,6 +251,35 @@ public void testGetChecksumInfoSuccess() throws Exception {
}
}
+ @Test
+ public void testDataChecksumReportedAtSCM() throws Exception {
+ long containerID = writeDataAndGetContainer(true);
+ // Check non-zero checksum after container close
+ Set<ContainerReplica> containerReplicas =
cluster.getStorageContainerManager().getContainerManager()
+ .getContainerReplicas(ContainerID.valueOf(containerID));
+ for (ContainerReplica containerReplica: containerReplicas) {
+ assertNotEquals(0, containerReplica.getDataChecksum());
+ }
+
cluster.getStorageContainerLocationClient().reconcileContainer(containerID);
+ Thread.sleep(10000);
+
+ // Check non-zero checksum after container reconciliation
+ containerReplicas =
cluster.getStorageContainerManager().getContainerManager()
+ .getContainerReplicas(ContainerID.valueOf(containerID));
+ for (ContainerReplica containerReplica: containerReplicas) {
+ assertNotEquals(0, containerReplica.getDataChecksum());
+ }
+
+ // Check non-zero checksum after datanode restart
+ // Restarting all the nodes take more time in mini ozone cluster, so
restarting only one node
+ cluster.restartHddsDatanode(0, true);
+ containerReplicas =
cluster.getStorageContainerManager().getContainerManager()
+ .getContainerReplicas(ContainerID.valueOf(containerID));
+ for (ContainerReplica containerReplica: containerReplicas) {
+ assertNotEquals(0, containerReplica.getDataChecksum());
Review Comment:
This may still be getting the old container info. There's no check that the
DN has re-registered with SCM and sent the FCR between its restart and this
check. The node will not have gone stale/dead in this time so [mini ozone's
wait for node
restart](https://github.com/apache/ozone/blob/86e483d9b24de9b95bc7093e4ac0e2071e77e913/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneClusterImpl.java#L203)
won't catch this condition
##########
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/checksum/TestContainerCommandReconciliation.java:
##########
@@ -247,6 +251,35 @@ public void testGetChecksumInfoSuccess() throws Exception {
}
}
+ @Test
+ public void testDataChecksumReportedAtSCM() throws Exception {
+ long containerID = writeDataAndGetContainer(true);
+ // Check non-zero checksum after container close
+ Set<ContainerReplica> containerReplicas =
cluster.getStorageContainerManager().getContainerManager()
+ .getContainerReplicas(ContainerID.valueOf(containerID));
+ for (ContainerReplica containerReplica: containerReplicas) {
+ assertNotEquals(0, containerReplica.getDataChecksum());
+ }
+
cluster.getStorageContainerLocationClient().reconcileContainer(containerID);
+ Thread.sleep(10000);
Review Comment:
We should wait on a condition instead of sleeping.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]