prashantwason opened a new pull request, #18034: URL: https://github.com/apache/hudi/pull/18034
### Describe the issue this Pull Request addresses When a `FileNotFoundException` is thrown during file deletion in the clean operation, the file is already gone. However, the current code returns `false` (failure) which causes the Metadata Table (MDT) to not be updated. This leads to subsequent clean runs repeatedly targeting the same already-deleted files for deletion. For a tier-1 table receiving lots of updates, this resulted in a partition with > 1M files that couldn't be cleaned. Related to HUDI-3766. ### Summary and Changelog **Summary:** Fix clean operation to treat `FileNotFoundException` as a successful deletion so MDT is updated correctly. **Changelog:** - Modified `CleanActionExecutor.deleteFileAndGetResult()` to return `true` instead of `false` when catching `FileNotFoundException` - Added debug logging for when a file is not found during clean - Updated comment to explain the reasoning behind treating missing files as successful deletions ### Impact This fix ensures that when files are already deleted (e.g., by a previous clean attempt or external process), the MDT is properly updated to reflect this. Without this fix, MDT retains entries for deleted files, causing repeated deletion attempts in subsequent clean runs. ### Risk Level low - The change is minimal and follows the existing logic pattern where a missing file during deletion should be treated as success (the goal of deletion is achieved - the file doesn't exist). ### Documentation Update none - No new configs or user-facing features are added. ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Enough context is provided in the sections above - [x] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
