[jira] [Updated] (HDDS-8062) Persist reason for container replica being marked unhealthy

Ethan Rose (Jira) Wed, 28 Jun 2023 00:33:09 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ethan Rose updated HDDS-8062:
-----------------------------
    Status: Patch Available  (was: In Progress)

> Persist reason for container replica being marked unhealthy
> -----------------------------------------------------------
>
>                 Key: HDDS-8062
>                 URL: https://issues.apache.org/jira/browse/HDDS-8062
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ethan Rose
>            Assignee: Ethan Rose
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: container_log_v1.pdf
>
>
> Once a container replica is marked unhealthy by the scanner, it would be 
> helpful for debugging to persist why the container was marked unhealthy. Just 
> logging to the main datanode log will eventually roll off and would require 
> more filtering to figure out what happened.
> Reasons for marking unhealthy include:
> * Corrupted block (and which block was corrupted)
> * Corrupted container metadata file
> * Volume failure
> Some options for persisting the information are:
> * Into the .container file itself.
> ** May not work if the container file is corrupted.
> * To the datanode audit log
> ** Would get mixed up with client operations like put block.
> * To a different file within the container
> ** This could be used to track the entire lifecycle of the container, like 
> when it was created, closed, replicated, and marked unhealthy.
> * To a dedicated log4j logger that can be configured to go to a different 
> file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-8062) Persist reason for container replica being marked unhealthy

Reply via email to