[
https://issues.apache.org/jira/browse/HDDS-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HDDS-13650:
----------------------------------
Labels: pull-request-available (was: )
> BackgroundContainerDataScanner should skip containers marked as DELETED
> -----------------------------------------------------------------------
>
> Key: HDDS-13650
> URL: https://issues.apache.org/jira/browse/HDDS-13650
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Gargi Jaiswal
> Assignee: Gargi Jaiswal
> Priority: Major
> Labels: pull-request-available
>
> A race condition exists between the *DiskBalancerService* and the
> {*}BackgroundContainerDataScanner{*}, causing IOExceptions when the scanner
> attempts to access a container that the disk balancer is in the process of
> deleting.
> The sequence of events is as follows:
> 1. DiskBalancerService successfully moves a container to a new volume.
> 2. It marks the original source container as {*}DELETED{*}.
> 3. It proceeds to delete the underlying files (e.g., the chunks directory) of
> the source container.
> 4. Concurrently, the *BackgroundContainerDataScanner* starts a scan and
> attempts to access the files of the source container, which may have already
> been deleted.
> This results in errors like *Chunks dir ... does not exist* and Container
> [id] has been deleted in the datanode logs, creating noise and indicating
> instability.
> {code:java}
> 2025-09-03 10:45:50,434 WARN [DiskBalancerService#7]-
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer: Marked
> container
> DELETED from CLOSED: KeyValueContainerData #4100 (DELETED, non-empty, ri=0,
> origin=[dn_8390d8c1-144a-4c53-bcf3-f3d3c080f208, pipeline_2299f7af-a809-418e
> 918b-89f5a41a3420])
> 2025-09-03 10:45:50,713 ERROR
> [ContainerDataScanner(/hadoop-ozone/datanode/data1/hdds)]-
> org.apache.hadoop.ozone.container.common.helpers.ContainerUtils: Chunks dir
> /hadoop-ozone/datanode/data1/hdds/CID-de020254-efd5-4bcf-984f-034815a2566a/
> current/containerDir8/4100/chunks does not exist
> 2025-09-03 10:45:50,714 ERROR
> [ContainerDataScanner(/hadoop-ozone/datanode/data1/hdds)]-
> org.apache.hadoop.ozone.container.ozoneimpl.BackgroundContainerDataScanner:
> Container [4100] has been deleted.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>
> Chunks directory /hadoop-ozone/datanode/data1/hdds/CID-de020254-efd5-4bcf-984f
> 034815a2566a/current/containerDir8/4100/chunks does not exist.
> at
> org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.getChunkDir(Cont
>
> ainerUtils.java:302)
> at
> org.apache.hadoop.ozone.container.common.impl.ContainerLayoutVersion.getChunkFil
>
> e(ContainerLayoutVersion.java:115)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerCheck.scanBlock(KeyV
>
> alueContainerCheck.java:350)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerCheck.scanData(KeyVa
>
> lueContainerCheck.java:264)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerCheck.fullCheck(KeyV
>
> alueContainerCheck.java:162)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.scanData(KeyValueCo
>
> ntainer.java:949)
> at
> org.apache.hadoop.ozone.container.ozoneimpl.BackgroundContainerDataScanner.scanC
>
> ontainer(BackgroundContainerDataScanner.java:90)
> at
> org.apache.hadoop.ozone.container.ozoneimpl.AbstractBackgroundContainerScanner.s
>
> canContainers(AbstractBackgroundContainerScanner.java:115)
> at
> org.apache.hadoop.ozone.container.ozoneimpl.AbstractBackgroundContainerScanner.r
> unIteration(AbstractBackgroundContainerScanner.java:78)
> {code}
> *Proposed Solution:*
> Better to skip *ContainerDataScanner* for containers marked DELETED.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]