[jira] [Updated] (HDDS-15455) Implement Custom DataNode Container Directory Discovery and Duplicate Detection

Sreeja (Jira) Tue, 02 Jun 2026 06:31:08 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-15455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sreeja updated HDDS-15455:
--------------------------
    Description: 
Implement logic to traverse all storage volumes configured in 
*{{hdds.datanode.dir}}* and discover container directories present under the 
DataNode container storage hierarchy.

For each discovered container directory:
 * Extract the container ID from the directory name.
 * Collect the container directory path, storage volume, and directory size.
 * Determine the metadata status:
 ** {{*MISSING_METADATA*}} if {{metadata/\{containerId}.container}} does not 
exist.
 ** {{*INVALID_METADATA*}} if the metadata file exists but cannot be parsed, or 
if the container ID stored in the metadata does not match the directory-name 
container ID.
 ** *{{VALID}}* otherwise.

Store the results as a mapping:
{{containerId -> List<ContainerOccurrence>}}
where each occurrence contains the container directory path, volume, size, and 
metadata status.

Use this mapping to identify duplicate container directories by detecting 
container IDs associated with more than one on-disk occurrence across storage 
volumes on the same DataNode.

  was:
Implement logic to traverse all storage volumes configured in 
*{{hdds.datanode.dir}}* and discover container directories present on disk. For 
each container directory, collect the container ID, directory path, volume, 
size, and 
compute the metadata status :

if metadata/\{containerId}.container missing => *MISSING_METADATA*
if metadata/\{containerId}.container exists but parse fails or containerId does 
not match with directory-name containerId => *INVALID_METADATA*
otherwise => *VALID*

Store the results as a mapping of {{{}containerId -> list of disk 
occurrences{}}}, preserving all discovered copies of a container. This 
information will also be used to identify duplicate container directories by 
detecting container IDs with multiple on-disk occurrences across storage 
volumes on the same DataNode.


> Implement Custom DataNode Container Directory Discovery and Duplicate 
> Detection
> -------------------------------------------------------------------------------
>
>                 Key: HDDS-15455
>                 URL: https://issues.apache.org/jira/browse/HDDS-15455
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Sreeja
>            Assignee: Sreeja
>            Priority: Major
>
> Implement logic to traverse all storage volumes configured in 
> *{{hdds.datanode.dir}}* and discover container directories present under the 
> DataNode container storage hierarchy.
> For each discovered container directory:
>  * Extract the container ID from the directory name.
>  * Collect the container directory path, storage volume, and directory size.
>  * Determine the metadata status:
>  ** {{*MISSING_METADATA*}} if {{metadata/\{containerId}.container}} does not 
> exist.
>  ** {{*INVALID_METADATA*}} if the metadata file exists but cannot be parsed, 
> or if the container ID stored in the metadata does not match the 
> directory-name container ID.
>  ** *{{VALID}}* otherwise.
> Store the results as a mapping:
> {{containerId -> List<ContainerOccurrence>}}
> where each occurrence contains the container directory path, volume, size, 
> and metadata status.
> Use this mapping to identify duplicate container directories by detecting 
> container IDs associated with more than one on-disk occurrence across storage 
> volumes on the same DataNode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-15455) Implement Custom DataNode Container Directory Discovery and Duplicate Detection

Reply via email to