errose28 commented on code in PR #7127:
URL: https://github.com/apache/ozone/pull/7127#discussion_r1805031235
##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/BackgroundContainerDataScanner.java:
##########
@@ -87,27 +87,30 @@ public void scanContainer(Container<?> c)
ContainerData containerData = c.getContainerData();
long containerId = containerData.getContainerID();
logScanStart(containerData);
- ScanResult result = c.scanData(throttler, canceler);
-
- // Metrics for skipped containers should not be updated.
- if (result.getFailureType() == DELETED_CONTAINER) {
- LOG.error("Container [{}] has been deleted.",
- containerId, result.getException());
- return;
- }
- if (!result.isHealthy()) {
- LOG.error("Corruption detected in container [{}]. Marking it UNHEALTHY.",
- containerId, result.getException());
- boolean containerMarkedUnhealthy =
controller.markContainerUnhealthy(containerId, result);
- if (containerMarkedUnhealthy) {
- metrics.incNumUnHealthyContainers();
+ DataScanResult result = c.scanData(throttler, canceler);
+
+ if (result.isDeleted()) {
+ LOG.debug("Container [{}] has been deleted during the data scan.",
containerId);
+ } else {
+ if (!result.isHealthy()) {
+ logUnhealthyScanResult(containerId, result, LOG);
+
+ // Only increment the number of unhealthy containers if the container
was not already unhealthy.
+ if (controller.markContainerUnhealthy(containerId, result)) {
+ metrics.incNumUnHealthyContainers();
Review Comment:
I think we should log unhealthy results on each iteration, even if the
container was unhealthy on a previous iteration. The reason it is unhealthy may
have changed.
For the metric, the `controller.markContainerUnhealthy` call returns a
boolean indicating if the container state was changed, which we use to
increment the metric. Technically you could have an execution like this:
1. Background scanner marks container unhealthy
2. On-demand scanner moves it back to healthy
3. Background scanner increments the metric
I think this is expected behavior though, because the counts of each
container in the metric are from the background scanner's point of view. Since
it is a slow running process, states of containers it already scanned can
change after it has scanned them but before it has scanned them again.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]