prashantwason opened a new pull request, #18047:
URL: https://github.com/apache/hudi/pull/18047

   ### Describe the issue this Pull Request addresses
   
   Closes #18046
   
   When loading files in `AbstractTableFileSystemView.listPartition()`, all 
files in a partition are loaded without validating that they are valid HUDI 
data or log files. This can cause validation exceptions later in the code when 
stray files (temporary files, hidden files, or files with corrupted names) are 
processed, since HUDI requires file names to have specific formats.
   
   ### Summary and Changelog
   
   Added filtering to ensure only valid HUDI data files (base files and log 
files) are loaded when listing partitions.
   
   **Changes:**
   - Added `filterValidDataFiles()` method that uses `FSUtils.isDataFile()` to 
validate files
   - Updated `getAllFilesInPartition()` to filter files after loading from 
metadata
   - Updated `ensurePartitionsLoadedCorrectly()` to filter files returned by 
`listPartitions()` before adding them to the view
   
   ### Impact
   
   No public API changes. This is a defensive enhancement that filters out 
invalid files early in the file loading process, preventing potential 
validation exceptions downstream.
   
   ### Risk Level
   
   low - The change uses the existing `FSUtils.isDataFile()` method which 
already validates file names using established patterns. All existing tests 
pass.
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to