[
https://issues.apache.org/jira/browse/HDDS-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695532#comment-17695532
]
Ritesh Shukla commented on HDDS-8062:
-------------------------------------
Maybe to a different audit log? There needs to be retention policy for tracking
container state change and for debugging we might need to go back in time quite
a bit and for nodes that may be lost drives or disappeared themselves.
> Persist reason for container replica being marked unhealthy
> -----------------------------------------------------------
>
> Key: HDDS-8062
> URL: https://issues.apache.org/jira/browse/HDDS-8062
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Ethan Rose
> Priority: Major
>
> Once a container replica is marked unhealthy by the scanner, it would be
> helpful for debugging to persist why the container was marked unhealthy. Just
> logging to the main datanode log will eventually roll off and would require
> more filtering to figure out what happened.
> Reasons for marking unhealthy include:
> - Corrupted block (and which block was corrupted)
> - Corrupted container metadata file
> - Volume failure
> Some options for persisting the information are:
> - Into the .container file itself.
> - May not work if the container file is corrupted.
> - To the datanode audit log
> - Would get mixed up with client operations like put block.
> - To a different file within the container
> - This could be used to track the entire lifecycle of the container, like
> when it was created, closed, replicated, and marked unhealthy.
> - To a dedicated log4j logger that can be configured to go to a different
> file.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]