[jira] [Commented] (HDDS-9979) Sum of used, available, and reserved exceeds the physical volume size

Zita Dombi (Jira) Tue, 23 Jan 2024 05:19:06 -0800


    [ 
https://issues.apache.org/jira/browse/HDDS-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809940#comment-17809940
 ]


Zita Dombi commented on HDDS-9979:
----------------------------------

{quote} This cluster is consisted from 36 nodes. Each node has 24 of 14 TB HDD 
drives. Expected total capacity per a single node is calculated by: 36 bays * 
14 TB * 10^12 / 1024^4 = 458 TiB, so the sum of volume_info_metrics_
Unknown macro: \{used,available,reserved}
should be equal to 458 TiB.
{quote}
Regarding the first example, I'm not sure that I completely understand this 
calculation [~ksugihara]. If each node has 24 of 14 TB HDD drives, shouldn't it 
be 24 * 14 * 10^12 / 1024^4 = 305 TiB? And we would have 36 of these nodes?
{quote}We set `hdds.datanode.dir.du.reserved.percent` to 0.2 to ensure 20% 
reserved space in DN, but some nodes reach full like the above image even 
though we have the restriction (Ozone exceeds the restriction and fills the 
reserved space as well).
{quote}
On these screenshots I don't see examples of this, where the reserved space is 
less than the 20% (to make this sure of course we would need to know the exact 
capacity for each node). What I see is that there are a few nodes where the 
green bar is bigger, than the others nodes in the cluster, I don't see smaller 
ones (only if all of them are smaller, than it should be :)). To calculate this 
it would be good to see the exact numbers on the second cluster, can you attach 
that maybe?

{quote}My quick fix is to allow the avail the minus value.{quote}
How could the available space be negative? 

Thanks in advance!

> Sum of used, available, and reserved exceeds the physical volume size
> ---------------------------------------------------------------------
>
>                 Key: HDDS-9979
>                 URL: https://issues.apache.org/jira/browse/HDDS-9979
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: DN
>    Affects Versions: 1.4.0
>            Reporter: Kohei Sugihara
>            Assignee: Zita Dombi
>            Priority: Major
>         Attachments: cluster1.png, cluster2.png, 
> image-2024-01-08-19-31-38-781.png
>
>
> While reviewing DN metrics, I noticed the sum of Used, Available, and 
> Reserved is different from the actual volume size. I don't survey Jira deeply 
> for existing similar issues, so I'm appreciate tell me similar issues if you 
> know. We experienced this issue in two clusters. Cluster #1 gains much data 
> and experienced disk full many times.
> h2. Example 1: Cluster #1
> This cluster is consisted from 36 nodes. Each node has 24 of 14 TB HDD 
> drives. Expected total capacity per a single node is calculated by: 36 bays * 
> 14 TB * 10^12 / 1024^4 = 458 TiB, so the sum of 
> {{{}volume_info_metrics_{used,available,reserved{}}}} should be equal to 458 
> TiB. However, we experience differ results.
> The cluster1.png shows a stacked bar graph. Reported metrics are vary and 
> exceeds 458 TiB.
> !cluster1.png!
> h2. Example 2: Cluster #2
> This is another example and each node has 12 of 14 TB HDD drives. Expected 
> total capacity per a single node is calculated by: 12 bays * 14 TB * 10^12 / 
> 1024^4 = 153 TiB.
> The cluster2.png shows a stacked bar graph. Reported metrics is almost same 
> among DNs but some exceptions exceed the physical capacity.
> !cluster2.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-9979) Sum of used, available, and reserved exceeds the physical volume size

Reply via email to