[
https://issues.apache.org/jira/browse/HDDS-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809940#comment-17809940
]
Zita Dombi commented on HDDS-9979:
----------------------------------
{quote} This cluster is consisted from 36 nodes. Each node has 24 of 14 TB HDD
drives. Expected total capacity per a single node is calculated by: 36 bays *
14 TB * 10^12 / 1024^4 = 458 TiB, so the sum of volume_info_metrics_
Unknown macro: \{used,available,reserved}
should be equal to 458 TiB.
{quote}
Regarding the first example, I'm not sure that I completely understand this
calculation [~ksugihara]. If each node has 24 of 14 TB HDD drives, shouldn't it
be 24 * 14 * 10^12 / 1024^4 = 305 TiB? And we would have 36 of these nodes?
{quote}We set `hdds.datanode.dir.du.reserved.percent` to 0.2 to ensure 20%
reserved space in DN, but some nodes reach full like the above image even
though we have the restriction (Ozone exceeds the restriction and fills the
reserved space as well).
{quote}
On these screenshots I don't see examples of this, where the reserved space is
less than the 20% (to make this sure of course we would need to know the exact
capacity for each node). What I see is that there are a few nodes where the
green bar is bigger, than the others nodes in the cluster, I don't see smaller
ones (only if all of them are smaller, than it should be :)). To calculate this
it would be good to see the exact numbers on the second cluster, can you attach
that maybe?
{quote}My quick fix is to allow the avail the minus value.{quote}
How could the available space be negative?
Thanks in advance!
> Sum of used, available, and reserved exceeds the physical volume size
> ---------------------------------------------------------------------
>
> Key: HDDS-9979
> URL: https://issues.apache.org/jira/browse/HDDS-9979
> Project: Apache Ozone
> Issue Type: Bug
> Components: DN
> Affects Versions: 1.4.0
> Reporter: Kohei Sugihara
> Assignee: Zita Dombi
> Priority: Major
> Attachments: cluster1.png, cluster2.png,
> image-2024-01-08-19-31-38-781.png
>
>
> While reviewing DN metrics, I noticed the sum of Used, Available, and
> Reserved is different from the actual volume size. I don't survey Jira deeply
> for existing similar issues, so I'm appreciate tell me similar issues if you
> know. We experienced this issue in two clusters. Cluster #1 gains much data
> and experienced disk full many times.
> h2. Example 1: Cluster #1
> This cluster is consisted from 36 nodes. Each node has 24 of 14 TB HDD
> drives. Expected total capacity per a single node is calculated by: 36 bays *
> 14 TB * 10^12 / 1024^4 = 458 TiB, so the sum of
> {{{}volume_info_metrics_{used,available,reserved{}}}} should be equal to 458
> TiB. However, we experience differ results.
> The cluster1.png shows a stacked bar graph. Reported metrics are vary and
> exceeds 458 TiB.
> !cluster1.png!
> h2. Example 2: Cluster #2
> This is another example and each node has 12 of 14 TB HDD drives. Expected
> total capacity per a single node is calculated by: 12 bays * 14 TB * 10^12 /
> 1024^4 = 153 TiB.
> The cluster2.png shows a stacked bar graph. Reported metrics is almost same
> among DNs but some exceptions exceed the physical capacity.
> !cluster2.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]