[ 
https://issues.apache.org/jira/browse/HDDS-14084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason O'Sullivan updated HDDS-14084:
------------------------------------
    Description: 
The *Data Node selection process* for forming new *Ratis pipelines* currently 
*silently ignores* nodes that do not meet the minimum space requirements for 
either metadata or data writes. This behaviour makes it difficult for operators 
to quickly identify and troubleshoot Data Nodes that are failing to join 
pipelines.

The following properties are used by the *Storage Container Manager (SCM)* 
during node selection to enforce minimum space requirements:
 * *Metadata Write Space:*

 * 
 ** {{ozone.scm.datanode.ratis.volume.free-space.min}} (Minimum free space 
required on the Data Node's disk volume for {*}Ratis metadata{*}.)

 * *Data Write Space:*

 * 
 ** {{ozone.scm.container.size}} (The target size of a container, which acts as 
the effective *minimum free space* required to allocate a new container for 
data.)

Currently, the reason for ignoring a Data Node is only recorded at the *DEBUG* 
log level.

The proposal is to raise the log level for these specific space-check failures 
to *INFO* or *WARN* within the SCM node selection logic. This change will 
ensure that Data Nodes with insufficient write space (for either metadata or 
data) are *highlighted sooner* in the logs, providing immediate visibility to 
operators without requiring increased log verbosity for the entire system.

*Note:* While this change offers better visibility, it may lead to an increase 
in log output under heavy operation or when a large number of Data Nodes are 
consistently low on space.

  was:
The *Data Node selection process* for forming new *Ratis pipelines* currently 
*silently ignores* nodes that do not meet the minimum space requirements for 
either metadata or data writes. This behaviour makes it difficult for operators 
to quickly identify and troubleshoot Data Nodes that are failing to join 
pipelines.



The following properties are used by the *Storage Container Manager (SCM)* 
during node selection to enforce minimum space requirements:
 * *Metadata Write Space:*

 ** {{ozone.scm.datanode.ratis.volume.free-space.min}} (Minimum free space 
required on the Data Node's disk volume for {*}Ratis metadata{*}.)

 * *Data Write Space:*

 ** {{ozone.scm.container.size}} (The target size of a container, which acts as 
the effective *minimum free space* required to allocate a new container for 
data.)

Currently, the reason for ignoring a Data Node is only recorded at the *DEBUG* 
log level.

We propose raising the log level for these specific space-check failures to 
*INFO* or *WARN* within the SCM node selection logic. This change will ensure 
that Data Nodes with insufficient write space (for either metadata or data) are 
*highlighted sooner* in the logs, providing immediate visibility to operators 
without requiring increased log verbosity for the entire system.



*Note:* While this change offers better visibility, it may lead to an increase 
in log output under heavy operation or when a large number of Data Nodes are 
consistently low on space.


> Highlight Data Nodes with Insufficient Write Space
> --------------------------------------------------
>
>                 Key: HDDS-14084
>                 URL: https://issues.apache.org/jira/browse/HDDS-14084
>             Project: Apache Ozone
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 1.4.1, 2.1.0
>            Reporter: Jason O'Sullivan
>            Assignee: Jason O'Sullivan
>            Priority: Minor
>             Fix For: 2.2.0
>
>
> The *Data Node selection process* for forming new *Ratis pipelines* currently 
> *silently ignores* nodes that do not meet the minimum space requirements for 
> either metadata or data writes. This behaviour makes it difficult for 
> operators to quickly identify and troubleshoot Data Nodes that are failing to 
> join pipelines.
> The following properties are used by the *Storage Container Manager (SCM)* 
> during node selection to enforce minimum space requirements:
>  * *Metadata Write Space:*
>  * 
>  ** {{ozone.scm.datanode.ratis.volume.free-space.min}} (Minimum free space 
> required on the Data Node's disk volume for {*}Ratis metadata{*}.)
>  * *Data Write Space:*
>  * 
>  ** {{ozone.scm.container.size}} (The target size of a container, which acts 
> as the effective *minimum free space* required to allocate a new container 
> for data.)
> Currently, the reason for ignoring a Data Node is only recorded at the 
> *DEBUG* log level.
> The proposal is to raise the log level for these specific space-check 
> failures to *INFO* or *WARN* within the SCM node selection logic. This change 
> will ensure that Data Nodes with insufficient write space (for either 
> metadata or data) are *highlighted sooner* in the logs, providing immediate 
> visibility to operators without requiring increased log verbosity for the 
> entire system.
> *Note:* While this change offers better visibility, it may lead to an 
> increase in log output under heavy operation or when a large number of Data 
> Nodes are consistently low on space.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to