[
https://issues.apache.org/jira/browse/HDDS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17657006#comment-17657006
]
Dave Teng commented on HDDS-7097:
---------------------------------
After manually test in local cluster to verify the log message, the first
issue, "new unhealthy state is incorrectly logged as the previous state",
couldn't be replicated. The previous state in log seems to show correctly. Thus
it might be other issue cause the unhealthy state container to be marked
unhealthy repetitively. Will create other ticket when it happen again &
document the context what triggers it if possible.
> Container scanner log output lacks useful information
> -----------------------------------------------------
>
> Key: HDDS-7097
> URL: https://issues.apache.org/jira/browse/HDDS-7097
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Ethan Rose
> Assignee: Dave Teng
> Priority: Major
> Labels: pull-request-available
>
> Currently the output from the container scanner may look like this
> {code}
> 2022-08-04 14:16:37,702 WARN
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer: Moving
> container
> /hadoop-ozone/datanode/data/hdds/CID-5612c780-06f8-4ac5-9eae-498159abd009/current/containerDir1/1008
> to state UNHEALTHY from state:UNHEALTHY
> Trace:java.base/java.lang.Thread.getStackTrace(Thread.java:1606)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058)
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.markContainerUnhealthy(KeyValueContainer.java:335)
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.markContainerUnhealthy(KeyValueHandler.java:1017)
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.markContainerUnhealthy(ContainerController.java:116)
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerDataScanner.runIteration(ContainerDataScanner.java:108)
> org.apache.hadoop.ozone.container.ozoneimpl.ContainerDataScanner.run(ContainerDataScanner.java:81)
> ...
> 2022-08-04 14:30:19,407 ERROR
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerCheck: Corruption
> detected in container: [2] Exception: [null]
> {code}
> There's numerous problems with this:
> - The previous container state is not logged. The new unhealthy state is
> incorrectly logged as the previous state.
> - The exception identifying the corruption only has its message printed. The
> exception object itself should be logged to better identify the failure and
> catch cases like above where there is no exception message (probably caused
> by a bug).
> - The stack trace of the call to {{KeyValueContainer#markContainerUnhealthy}}
> is logged, which both verbose and not useful.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]