prashantwason opened a new issue, #18046:
URL: https://github.com/apache/hudi/issues/18046

   ### Describe the problem you faced
   
   When loading files in `AbstractTableFileSystemView.listPartition()`, all 
files in a partition are loaded without validating that they are valid HUDI 
data or log files. This can cause validation exceptions later in the code when 
stray files (temporary files, hidden files, or files with corrupted names) are 
processed, since HUDI requires file names to have specific formats.
   
   ### Expected behavior
   
   Only valid HUDI data files (base files and log files) should be loaded when 
listing partitions. Files that don't match the expected HUDI file name patterns 
should be filtered out.
   
   ### Environment Description
   
   * Hudi version: master
   * Spark version: N/A
   * Storage: Any
   
   ### Additional context
   
   The fix adds a `filterValidDataFiles()` method that uses 
`FSUtils.isDataFile()` to validate files before they are added to the file 
system view. This filters files at two entry points:
   1. `getAllFilesInPartition()` 
   2. `ensurePartitionsLoadedCorrectly()` where `listPartitions()` is called


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to