ArafatKhan2198 opened a new pull request, #8109:
URL: https://github.com/apache/ozone/pull/8109

   ## What changes were proposed in this pull request?
   
   **Problem:**  
   When using FSO buckets, files with the same name uploaded into different 
directories were being merged into a single key record. This was because 
Recon’s container key mapping used only the volume, bucket, and file name as 
the unique identifier, which ignored the full directory path information.
   
   **Reproducing the Issue:**  
   The issue can be reproduced by creating a nested directory structure and 
uploading two files (testfile1 and testfile2) at different directory depths. 
For example, run the following commands:
   
   ```
   ozone fs -mkdir -p ofs://om/volume1/fso-bucket/dir1/dir2/dir3
   ozone fs -put -f testfile1 ofs://om/volume1/fso-bucket/dir1/
   ozone fs -put -f testfile2 ofs://om/volume1/fso-bucket/dir1/
   ozone fs -put -f testfile1 ofs://om/volume1/fso-bucket/dir1/dir2/
   ozone fs -put -f testfile2 ofs://om/volume1/fso-bucket/dir1/dir2/
   ozone fs -put -f testfile1 ofs://om/volume1/fso-bucket/dir1/dir2/dir3/
   ozone fs -put -f testfile2 ofs://om/volume1/fso-bucket/dir1/dir2/dir3/
   ```
   
   In this scenario, two duplicate file names (`testfile1` and `testfile2`) are 
created in three different directory hierarchies (`dir1`, `dir1/dir2`, and` 
dir1/dir2/dir3`).
   
   **Root Cause:**  
   The root cause was that the Recon container key mapping computed a unique 
key based only on the volume, bucket, and file name. For FSO buckets, the 
directory structure is encoded as part of the raw key prefix (using negative 
object IDs), but this information was being omitted from the computed key. As a 
result, files with identical names from different directories were being 
incorrectly merged.
   
   **Fix:**  
   The fix updates the container key mapping logic to use the raw key prefix 
from the container key table as the unique identifier. Since the raw key prefix 
includes the complete directory structure (with the object IDs representing the 
directories, volume, bucket), this change ensures that keys with the same file 
name but in different directories (as in the above scenario) are recognized as 
distinct records by Recon.
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-12589
   
   ## How was this patch tested?
   
   - I manually verified the fix by executing the above commands, which created 
duplicate files (testfile1 and testfile2) under different directory 
hierarchies, and confirmed that the container endpoint returned separate 
records for each file. 
   - Additionally, I wrote unit tests for both the `ContainerKeyMapperTask` and 
the `container endpoint` to simulate duplicate FSO key names under different 
directories, ensuring that the raw key prefix is correctly used to 
differentiate the keys.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to