[jira] [Commented] (HDDS-8062) Persist reason for container replica being marked unhealthy

Ritesh Shukla (Jira) Thu, 02 Mar 2023 00:12:05 -0800


    [ 
https://issues.apache.org/jira/browse/HDDS-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695532#comment-17695532
 ]


Ritesh Shukla commented on HDDS-8062:
-------------------------------------

Maybe to a different audit log? There needs to be retention policy for tracking 
container state change and for debugging we might need to go back in time quite 
a bit and for nodes that may be lost drives or disappeared themselves. 

> Persist reason for container replica being marked unhealthy
> -----------------------------------------------------------
>
>                 Key: HDDS-8062
>                 URL: https://issues.apache.org/jira/browse/HDDS-8062
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ethan Rose
>            Priority: Major
>
> Once a container replica is marked unhealthy by the scanner, it would be 
> helpful for debugging to persist why the container was marked unhealthy. Just 
> logging to the main datanode log will eventually roll off and would require 
> more filtering to figure out what happened.
> Reasons for marking unhealthy include:
> - Corrupted block (and which block was corrupted)
> - Corrupted container metadata file
> - Volume failure
> Some options for persisting the information are:
> - Into the .container file itself.
>     - May not work if the container file is corrupted.
> - To the datanode audit log
>     - Would get mixed up with client operations like put block.
> - To a different file within the container
>     - This could be used to track the entire lifecycle of the container, like 
> when it was created, closed, replicated, and marked unhealthy.
> - To a dedicated log4j logger that can be configured to go to a different 
> file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-8062) Persist reason for container replica being marked unhealthy

Reply via email to