Re: [PR] HDDS-11290. Container scanner should keep scanning after non-fatal errors [ozone]

via GitHub Thu, 17 Oct 2024 08:51:06 -0700


errose28 commented on code in PR #7127:
URL: https://github.com/apache/ozone/pull/7127#discussion_r1805031235



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/BackgroundContainerDataScanner.java:
##########
@@ -87,27 +87,30 @@ public void scanContainer(Container<?> c)
     ContainerData containerData = c.getContainerData();
     long containerId = containerData.getContainerID();
     logScanStart(containerData);
-    ScanResult result = c.scanData(throttler, canceler);
-
-    // Metrics for skipped containers should not be updated.
-    if (result.getFailureType() == DELETED_CONTAINER) {
-      LOG.error("Container [{}] has been deleted.",
-          containerId, result.getException());
-      return;
-    }
-    if (!result.isHealthy()) {
-      LOG.error("Corruption detected in container [{}]. Marking it UNHEALTHY.",
-          containerId, result.getException());
-      boolean containerMarkedUnhealthy = 
controller.markContainerUnhealthy(containerId, result);
-      if (containerMarkedUnhealthy) {
-        metrics.incNumUnHealthyContainers();
+    DataScanResult result = c.scanData(throttler, canceler);
+
+    if (result.isDeleted()) {
+      LOG.debug("Container [{}] has been deleted during the data scan.", 
containerId);
+    } else {
+      if (!result.isHealthy()) {
+        logUnhealthyScanResult(containerId, result, LOG);
+
+        // Only increment the number of unhealthy containers if the container 
was not already unhealthy.
+        if (controller.markContainerUnhealthy(containerId, result)) {
+          metrics.incNumUnHealthyContainers();

Review Comment:
   I think we should log unhealthy results on each iteration, even if the 
container was unhealthy on a previous iteration. The reason it is unhealthy may 
have changed.
   
   For the metric, the `controller.markContainerUnhealthy` call returns a 
boolean indicating if the container state was changed, which we use to 
increment the metric. Technically you could have an execution like this:
   1. Background scanner marks container unhealthy
   2. On-demand scanner moves it back to healthy
   3. Background scanner increments the metric
   
   I think this is expected behavior though, because the counts of each 
container in the metric are from the background scanner's point of view. Since 
it is a slow running process, states of containers it already scanned can 
change after it has scanned them but before it has scanned them again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-11290. Container scanner should keep scanning after non-fatal errors [ozone]

Reply via email to