prashantwason opened a new pull request, #18034:
URL: https://github.com/apache/hudi/pull/18034

   ### Describe the issue this Pull Request addresses
   
   When a `FileNotFoundException` is thrown during file deletion in the clean 
operation, the file is already gone. However, the current code returns `false` 
(failure) which causes the Metadata Table (MDT) to not be updated. This leads 
to subsequent clean runs repeatedly targeting the same already-deleted files 
for deletion.
   
   For a tier-1 table receiving lots of updates, this resulted in a partition 
with > 1M files that couldn't be cleaned.
   
   Related to HUDI-3766.
   
   ### Summary and Changelog
   
   **Summary:** Fix clean operation to treat `FileNotFoundException` as a 
successful deletion so MDT is updated correctly.
   
   **Changelog:**
   - Modified `CleanActionExecutor.deleteFileAndGetResult()` to return `true` 
instead of `false` when catching `FileNotFoundException`
   - Added debug logging for when a file is not found during clean
   - Updated comment to explain the reasoning behind treating missing files as 
successful deletions
   
   ### Impact
   
   This fix ensures that when files are already deleted (e.g., by a previous 
clean attempt or external process), the MDT is properly updated to reflect 
this. Without this fix, MDT retains entries for deleted files, causing repeated 
deletion attempts in subsequent clean runs.
   
   ### Risk Level
   
   low - The change is minimal and follows the existing logic pattern where a 
missing file during deletion should be treated as success (the goal of deletion 
is achieved - the file doesn't exist).
   
   ### Documentation Update
   
   none - No new configs or user-facing features are added.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to