[
https://issues.apache.org/jira/browse/HDDS-14084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason O'Sullivan updated HDDS-14084:
------------------------------------
Description:
The *Data Node selection process* for forming new *Ratis pipelines* currently
*silently ignores* nodes that do not meet the minimum space requirements for
either metadata or data writes. This behaviour makes it difficult for operators
to quickly identify and troubleshoot Data Nodes that are failing to join
pipelines.
The following properties are used by the *Storage Container Manager (SCM)*
during node selection to enforce minimum space requirements:
* *Metadata Write Space:*
*
** {{ozone.scm.datanode.ratis.volume.free-space.min}} (Minimum free space
required on the Data Node's disk volume for {*}Ratis metadata{*}.)
* *Data Write Space:*
*
** {{ozone.scm.container.size}} (The target size of a container, which acts as
the effective *minimum free space* required to allocate a new container for
data.)
Currently, the reason for ignoring a Data Node is only recorded at the *DEBUG*
log level.
The proposal is to raise the log level for these specific space-check failures
to *INFO* or *WARN* within the SCM node selection logic. This change will
ensure that Data Nodes with insufficient write space (for either metadata or
data) are *highlighted sooner* in the logs, providing immediate visibility to
operators without requiring increased log verbosity for the entire system.
*Note:* While this change offers better visibility, it may lead to an increase
in log output under heavy operation or when a large number of Data Nodes are
consistently low on space.
was:
The *Data Node selection process* for forming new *Ratis pipelines* currently
*silently ignores* nodes that do not meet the minimum space requirements for
either metadata or data writes. This behaviour makes it difficult for operators
to quickly identify and troubleshoot Data Nodes that are failing to join
pipelines.
The following properties are used by the *Storage Container Manager (SCM)*
during node selection to enforce minimum space requirements:
* *Metadata Write Space:*
** {{ozone.scm.datanode.ratis.volume.free-space.min}} (Minimum free space
required on the Data Node's disk volume for {*}Ratis metadata{*}.)
* *Data Write Space:*
** {{ozone.scm.container.size}} (The target size of a container, which acts as
the effective *minimum free space* required to allocate a new container for
data.)
Currently, the reason for ignoring a Data Node is only recorded at the *DEBUG*
log level.
We propose raising the log level for these specific space-check failures to
*INFO* or *WARN* within the SCM node selection logic. This change will ensure
that Data Nodes with insufficient write space (for either metadata or data) are
*highlighted sooner* in the logs, providing immediate visibility to operators
without requiring increased log verbosity for the entire system.
*Note:* While this change offers better visibility, it may lead to an increase
in log output under heavy operation or when a large number of Data Nodes are
consistently low on space.
> Highlight Data Nodes with Insufficient Write Space
> --------------------------------------------------
>
> Key: HDDS-14084
> URL: https://issues.apache.org/jira/browse/HDDS-14084
> Project: Apache Ozone
> Issue Type: Bug
> Affects Versions: 2.0.0, 1.4.1, 2.1.0
> Reporter: Jason O'Sullivan
> Assignee: Jason O'Sullivan
> Priority: Minor
> Fix For: 2.2.0
>
>
> The *Data Node selection process* for forming new *Ratis pipelines* currently
> *silently ignores* nodes that do not meet the minimum space requirements for
> either metadata or data writes. This behaviour makes it difficult for
> operators to quickly identify and troubleshoot Data Nodes that are failing to
> join pipelines.
> The following properties are used by the *Storage Container Manager (SCM)*
> during node selection to enforce minimum space requirements:
> * *Metadata Write Space:*
> *
> ** {{ozone.scm.datanode.ratis.volume.free-space.min}} (Minimum free space
> required on the Data Node's disk volume for {*}Ratis metadata{*}.)
> * *Data Write Space:*
> *
> ** {{ozone.scm.container.size}} (The target size of a container, which acts
> as the effective *minimum free space* required to allocate a new container
> for data.)
> Currently, the reason for ignoring a Data Node is only recorded at the
> *DEBUG* log level.
> The proposal is to raise the log level for these specific space-check
> failures to *INFO* or *WARN* within the SCM node selection logic. This change
> will ensure that Data Nodes with insufficient write space (for either
> metadata or data) are *highlighted sooner* in the logs, providing immediate
> visibility to operators without requiring increased log verbosity for the
> entire system.
> *Note:* While this change offers better visibility, it may lead to an
> increase in log output under heavy operation or when a large number of Data
> Nodes are consistently low on space.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]