prashantwason opened a new pull request, #18047: URL: https://github.com/apache/hudi/pull/18047
### Describe the issue this Pull Request addresses Closes #18046 When loading files in `AbstractTableFileSystemView.listPartition()`, all files in a partition are loaded without validating that they are valid HUDI data or log files. This can cause validation exceptions later in the code when stray files (temporary files, hidden files, or files with corrupted names) are processed, since HUDI requires file names to have specific formats. ### Summary and Changelog Added filtering to ensure only valid HUDI data files (base files and log files) are loaded when listing partitions. **Changes:** - Added `filterValidDataFiles()` method that uses `FSUtils.isDataFile()` to validate files - Updated `getAllFilesInPartition()` to filter files after loading from metadata - Updated `ensurePartitionsLoadedCorrectly()` to filter files returned by `listPartitions()` before adding them to the view ### Impact No public API changes. This is a defensive enhancement that filters out invalid files early in the file loading process, preventing potential validation exceptions downstream. ### Risk Level low - The change uses the existing `FSUtils.isDataFile()` method which already validates file names using established patterns. All existing tests pass. ### Documentation Update none ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Enough context is provided in the sections above - [x] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
