Gargi-jais11 opened a new pull request, #9220:
URL: https://github.com/apache/ozone/pull/9220

   ## What changes were proposed in this pull request?
   A race condition exists between the `DiskBalancerService` and the 
`BackgroundContainerDataScanner`, causing IOExceptions when the scanner 
attempts to access a container that the disk balancer is in the process of 
deleting.
   
   The sequence of events is as follows:
   1. DiskBalancerService successfully moves a container to a new volume.
   2. It marks the original source container as DELETED.
   3. It proceeds to delete the underlying files (e.g., the chunks directory) 
of the source container.
   4. Concurrently, the BackgroundContainerDataScanner starts a scan and 
attempts to access the files of the source container, which may have already 
been deleted.
   
   This results in errors like **Chunks dir ... does not exist** and Container 
[id] has been deleted in the datanode logs, creating noise and indicating 
instability.
   ```
   2025-09-03 10:45:50,434 WARN [DiskBalancerService#7]-  
   org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer: Marked 
container   
   DELETED from CLOSED: KeyValueContainerData #4100 (DELETED, non-empty, ri=0,  
 
   origin=[dn_8390d8c1-144a-4c53-bcf3-f3d3c080f208, pipeline_2299f7af-a809-418e
   918b-89f5a41a3420])  
   2025-09-03 10:45:50,713 ERROR   
   [ContainerDataScanner(/hadoop-ozone/datanode/data1/hdds)]-  
   org.apache.hadoop.ozone.container.common.helpers.ContainerUtils: Chunks dir  
 
   /hadoop-ozone/datanode/data1/hdds/CID-de020254-efd5-4bcf-984f-034815a2566a/  
   current/containerDir8/4100/chunks does not exist  
   2025-09-03 10:45:50,714 ERROR   
   [ContainerDataScanner(/hadoop-ozone/datanode/data1/hdds)]-  
   org.apache.hadoop.ozone.container.ozoneimpl.BackgroundContainerDataScanner:  
 
   Container [4100] has been deleted.  
   
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:  
 
   Chunks directory 
/hadoop-ozone/datanode/data1/hdds/CID-de020254-efd5-4bcf-984f
   034815a2566a/current/containerDir8/4100/chunks does not exist.  
   at   
   
org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.getChunkDir(Cont
  
   ainerUtils.java:302)  
   at   
   
org.apache.hadoop.ozone.container.common.impl.ContainerLayoutVersion.getChunkFil
  
   e(ContainerLayoutVersion.java:115)  
   at   
   
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerCheck.scanBlock(KeyV
  
   alueContainerCheck.java:350)  
   at   
   
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerCheck.scanData(KeyVa
  
   lueContainerCheck.java:264)  
   at   
   
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerCheck.fullCheck(KeyV
  
   alueContainerCheck.java:162)  
   at   
   
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.scanData(KeyValueCo
  
   ntainer.java:949)  
   at   
   
org.apache.hadoop.ozone.container.ozoneimpl.BackgroundContainerDataScanner.scanC
  
   ontainer(BackgroundContainerDataScanner.java:90)  
   at   
   
org.apache.hadoop.ozone.container.ozoneimpl.AbstractBackgroundContainerScanner.s
  
   canContainers(AbstractBackgroundContainerScanner.java:115)  
   at   
   
org.apache.hadoop.ozone.container.ozoneimpl.AbstractBackgroundContainerScanner.r
   unIteration(AbstractBackgroundContainerScanner.java:78)  
   ```
   **Proposed Solution:**
   Better to skip **BackgroundContainerDataScanner** for containers marked 
DELETED.
   
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-13650
   
   ## How was this patch tested?
   
   Added a testcase in `TestBackgroundContainerDataScanner`. Also tested the 
patch with DiskBalancer changes and working well after changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to