Gargi-jais11 opened a new pull request, #9220: URL: https://github.com/apache/ozone/pull/9220
## What changes were proposed in this pull request? A race condition exists between the `DiskBalancerService` and the `BackgroundContainerDataScanner`, causing IOExceptions when the scanner attempts to access a container that the disk balancer is in the process of deleting. The sequence of events is as follows: 1. DiskBalancerService successfully moves a container to a new volume. 2. It marks the original source container as DELETED. 3. It proceeds to delete the underlying files (e.g., the chunks directory) of the source container. 4. Concurrently, the BackgroundContainerDataScanner starts a scan and attempts to access the files of the source container, which may have already been deleted. This results in errors like **Chunks dir ... does not exist** and Container [id] has been deleted in the datanode logs, creating noise and indicating instability. ``` 2025-09-03 10:45:50,434 WARN [DiskBalancerService#7]- org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer: Marked container DELETED from CLOSED: KeyValueContainerData #4100 (DELETED, non-empty, ri=0, origin=[dn_8390d8c1-144a-4c53-bcf3-f3d3c080f208, pipeline_2299f7af-a809-418e 918b-89f5a41a3420]) 2025-09-03 10:45:50,713 ERROR [ContainerDataScanner(/hadoop-ozone/datanode/data1/hdds)]- org.apache.hadoop.ozone.container.common.helpers.ContainerUtils: Chunks dir /hadoop-ozone/datanode/data1/hdds/CID-de020254-efd5-4bcf-984f-034815a2566a/ current/containerDir8/4100/chunks does not exist 2025-09-03 10:45:50,714 ERROR [ContainerDataScanner(/hadoop-ozone/datanode/data1/hdds)]- org.apache.hadoop.ozone.container.ozoneimpl.BackgroundContainerDataScanner: Container [4100] has been deleted. org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: Chunks directory /hadoop-ozone/datanode/data1/hdds/CID-de020254-efd5-4bcf-984f 034815a2566a/current/containerDir8/4100/chunks does not exist. at org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.getChunkDir(Cont ainerUtils.java:302) at org.apache.hadoop.ozone.container.common.impl.ContainerLayoutVersion.getChunkFil e(ContainerLayoutVersion.java:115) at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerCheck.scanBlock(KeyV alueContainerCheck.java:350) at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerCheck.scanData(KeyVa lueContainerCheck.java:264) at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerCheck.fullCheck(KeyV alueContainerCheck.java:162) at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.scanData(KeyValueCo ntainer.java:949) at org.apache.hadoop.ozone.container.ozoneimpl.BackgroundContainerDataScanner.scanC ontainer(BackgroundContainerDataScanner.java:90) at org.apache.hadoop.ozone.container.ozoneimpl.AbstractBackgroundContainerScanner.s canContainers(AbstractBackgroundContainerScanner.java:115) at org.apache.hadoop.ozone.container.ozoneimpl.AbstractBackgroundContainerScanner.r unIteration(AbstractBackgroundContainerScanner.java:78) ``` **Proposed Solution:** Better to skip **BackgroundContainerDataScanner** for containers marked DELETED. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-13650 ## How was this patch tested? Added a testcase in `TestBackgroundContainerDataScanner`. Also tested the patch with DiskBalancer changes and working well after changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
