prashantwason opened a new pull request, #18035:
URL: https://github.com/apache/hudi/pull/18035

   ### Describe the issue this Pull Request addresses
   
   When updating MDT with cleanMetadata, failed delete files were being 
included in the update. As a result, subsequent clean runs don't pick up these 
files for deletion. For a tier-1 table receiving lots of updates, this resulted 
in a partition with > 1M files.
   
   This fix addresses JIRA issue: 
[HUDI-3766](https://issues.apache.org/jira/browse/HUDI-3766)
   
   ### Summary and Changelog
   
   **Summary:** Exclude failed delete files from MDT updates so they can be 
retried in subsequent clean runs.
   
   **Changes:**
   1. **CleanActionExecutor.java**: When a `FileNotFoundException` is caught 
during file deletion, return `true` instead of `false`. If a file to be deleted 
is not found, treat it as a success since there is nothing to clean up on the 
FileSystem. By returning success, the entry is removed from MDT.
   
   2. **HoodieTableMetadataUtil.java**: In 
`convertMetadataToFilesPartitionRecords()`, filter out files that are in the 
`failedDeleteFiles` list before creating the MDT update record. This ensures 
failed deletes are excluded so they can be retried.
   
   3. **TestHoodieTableMetadataUtil.java**: Added test 
`testFailedDeletesAreExcludedFromCleanMetadataRecords()` to verify that failed 
deletes are excluded from MDT updates.
   
   ### Impact
   
   This change affects the behavior of clean operations when files fail to 
delete. Previously, failed deletes would still be recorded in MDT, causing them 
to never be retried. With this fix, failed deletes are excluded from the MDT 
update, allowing subsequent clean runs to pick them up.
   
   ### Risk Level
   
   low - The change is localized to the clean action executor and metadata 
table utility. The fix ensures consistency between what is actually deleted and 
what is recorded in MDT.
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to