[ 
https://issues.apache.org/jira/browse/HDDS-10721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Rose updated HDDS-10721:
------------------------------
    Description: 
Jira to track multiple minor improvements to space reservation configurations 
on datanode disks/volumes that can make them easier to work with.

For context, Ozone should not allow datanode volumes to get 100% full. This can 
cause the drive to "lock up" because some operations like block delete that 
would free up space still need extra disk space before they can complete 
because they must append to the RocksDB WAL. Once encountered, such issues are 
difficult to resolve.

These are a few of the issues with using the configurations I have encountered 
so far. If others have improvement ideas add them in the comments.
 * There is no way to specify default space reservation in bytes, only 
percentages.
 ** This is because {{hdds.datanode.dir.du.reserved}} requires a mapping of 
volume paths to space reserved values, meaning it cannot take a default value.
 ** Since the config is a string, one option is to update it to support either 
a volume name mapping, or a single byte value that applies to all volumes. This 
way we can give it a default value that can be refined on a per-volume basis if 
needed.

 * Invalid space reservation configurations fall back to a default value 
instead of failing datanode startup.
 ** These messages are easy to miss in the logs, and the admin will probably 
not be aware they are running with an undesired reserved space setting until 
they encounter disk full issues. Since the node was already stopped to adjust 
the configuration/load new volumes, hard failure is actually more user friendly 
in this case.

  was:Currently \{{hdds.datanode.dir.du.reserved}} requires a mapping of volume 
paths to space reserved values, meaning it cannot take a default value. Since 
the config is a string, we can update it to support either a volume name 
mapping, or a single byte value that applies to all volumes. This way we can 
give it a default value that can be refined on a per-volume basis if needed.


> Improvements to datanode DU space reservation configurations
> ------------------------------------------------------------
>
>                 Key: HDDS-10721
>                 URL: https://issues.apache.org/jira/browse/HDDS-10721
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone Datanode
>            Reporter: Ethan Rose
>            Priority: Major
>
> Jira to track multiple minor improvements to space reservation configurations 
> on datanode disks/volumes that can make them easier to work with.
> For context, Ozone should not allow datanode volumes to get 100% full. This 
> can cause the drive to "lock up" because some operations like block delete 
> that would free up space still need extra disk space before they can complete 
> because they must append to the RocksDB WAL. Once encountered, such issues 
> are difficult to resolve.
> These are a few of the issues with using the configurations I have 
> encountered so far. If others have improvement ideas add them in the comments.
>  * There is no way to specify default space reservation in bytes, only 
> percentages.
>  ** This is because {{hdds.datanode.dir.du.reserved}} requires a mapping of 
> volume paths to space reserved values, meaning it cannot take a default value.
>  ** Since the config is a string, one option is to update it to support 
> either a volume name mapping, or a single byte value that applies to all 
> volumes. This way we can give it a default value that can be refined on a 
> per-volume basis if needed.
>  * Invalid space reservation configurations fall back to a default value 
> instead of failing datanode startup.
>  ** These messages are easy to miss in the logs, and the admin will probably 
> not be aware they are running with an undesired reserved space setting until 
> they encounter disk full issues. Since the node was already stopped to adjust 
> the configuration/load new volumes, hard failure is actually more user 
> friendly in this case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to