[jira] [Commented] (HDDS-9979) Sum of used, available, and reserved exceeds the physical volume size

Kohei Sugihara (Jira) Tue, 09 Jan 2024 18:59:15 -0800


    [ 
https://issues.apache.org/jira/browse/HDDS-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804955#comment-17804955
 ]


Kohei Sugihara commented on HDDS-9979:
--------------------------------------

We set `hdds.datanode.dir.du.reserved.percent` to 0.2 to ensure 20% reserved 
space in DN, but some nodes reach full like the above image even though we have 
the restriction (Ozone exceeds the restriction and fills the reserved space as 
well).

The problem here is the metrics cannot report properly when the avail is less 
than zero. The current implementation reports the reserved as-is from the 
configured value; hence, it assumes both the avail must not reach the minus. My 
quick fix is to allow the avail the minus value.

> Sum of used, available, and reserved exceeds the physical volume size
> ---------------------------------------------------------------------
>
>                 Key: HDDS-9979
>                 URL: https://issues.apache.org/jira/browse/HDDS-9979
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: DN
>    Affects Versions: 1.4.0
>            Reporter: Kohei Sugihara
>            Assignee: Zita Dombi
>            Priority: Major
>         Attachments: cluster1.png, cluster2.png, 
> image-2024-01-08-19-31-38-781.png
>
>
> While reviewing DN metrics, I noticed the sum of Used, Available, and 
> Reserved is different from the actual volume size. I don't survey Jira deeply 
> for existing similar issues, so I'm appreciate tell me similar issues if you 
> know. We experienced this issue in two clusters. Cluster #1 gains much data 
> and experienced disk full many times.
> h2. Example 1: Cluster #1
> This cluster is consisted from 36 nodes. Each node has 24 of 14 TB HDD 
> drives. Expected total capacity per a single node is calculated by: 36 bays * 
> 14 TB * 10^12 / 1024^4 = 458 TiB, so the sum of 
> {{{}volume_info_metrics_{used,available,reserved{}}}} should be equal to 458 
> TiB. However, we experience differ results.
> The cluster1.png shows a stacked bar graph. Reported metrics are vary and 
> exceeds 458 TiB.
> !cluster1.png!
> h2. Example 2: Cluster #2
> This is another example and each node has 12 of 14 TB HDD drives. Expected 
> total capacity per a single node is calculated by: 12 bays * 14 TB * 10^12 / 
> 1024^4 = 153 TiB.
> The cluster2.png shows a stacked bar graph. Reported metrics is almost same 
> among DNs but some exceptions exceed the physical capacity.
> !cluster2.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-9979) Sum of used, available, and reserved exceeds the physical volume size

Reply via email to