[jira] [Updated] (HDDS-8062) Persist reason for container replica being marked unhealthy

Ethan Rose (Jira) Thu, 02 Mar 2023 09:44:06 -0800


     [ 
https://issues.apache.org/jira/browse/HDDS-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ethan Rose updated HDDS-8062:
-----------------------------
    Description: 
Once a container replica is marked unhealthy by the scanner, it would be 
helpful for debugging to persist why the container was marked unhealthy. Just 
logging to the main datanode log will eventually roll off and would require 
more filtering to figure out what happened.

Reasons for marking unhealthy include:
* Corrupted block (and which block was corrupted)
* Corrupted container metadata file
* Volume failure

Some options for persisting the information are:
* Into the .container file itself.
** May not work if the container file is corrupted.
* To the datanode audit log
** Would get mixed up with client operations like put block.
* To a different file within the container
** This could be used to track the entire lifecycle of the container, like when 
it was created, closed, replicated, and marked unhealthy.
* To a dedicated log4j logger that can be configured to go to a different file.

  was:
Once a container replica is marked unhealthy by the scanner, it would be 
helpful for debugging to persist why the container was marked unhealthy. Just 
logging to the main datanode log will eventually roll off and would require 
more filtering to figure out what happened.

Reasons for marking unhealthy include:
- Corrupted block (and which block was corrupted)
- Corrupted container metadata file
- Volume failure

Some options for persisting the information are:
- Into the .container file itself.
    - May not work if the container file is corrupted.
- To the datanode audit log
    - Would get mixed up with client operations like put block.
- To a different file within the container
    - This could be used to track the entire lifecycle of the container, like 
when it was created, closed, replicated, and marked unhealthy.
- To a dedicated log4j logger that can be configured to go to a different file.


> Persist reason for container replica being marked unhealthy
> -----------------------------------------------------------
>
>                 Key: HDDS-8062
>                 URL: https://issues.apache.org/jira/browse/HDDS-8062
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ethan Rose
>            Priority: Major
>
> Once a container replica is marked unhealthy by the scanner, it would be 
> helpful for debugging to persist why the container was marked unhealthy. Just 
> logging to the main datanode log will eventually roll off and would require 
> more filtering to figure out what happened.
> Reasons for marking unhealthy include:
> * Corrupted block (and which block was corrupted)
> * Corrupted container metadata file
> * Volume failure
> Some options for persisting the information are:
> * Into the .container file itself.
> ** May not work if the container file is corrupted.
> * To the datanode audit log
> ** Would get mixed up with client operations like put block.
> * To a different file within the container
> ** This could be used to track the entire lifecycle of the container, like 
> when it was created, closed, replicated, and marked unhealthy.
> * To a dedicated log4j logger that can be configured to go to a different 
> file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-8062) Persist reason for container replica being marked unhealthy

Reply via email to