prashantwason commented on code in PR #18047:
URL: https://github.com/apache/hudi/pull/18047#discussion_r2761487338


##########
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##########
@@ -422,11 +422,13 @@ private void ensurePartitionsLoadedCorrectly(List<String> 
partitionList) {
           LOG.debug("Time taken to list partitions {} ={}", partitionSet, 
(endLsTs - beginLsTs));
           pathInfoMap.forEach((partitionPair, statuses) -> {
             String relativePartitionStr = partitionPair.getLeft();
-            List<HoodieFileGroup> groups = 
addFilesToView(relativePartitionStr, statuses);
+            // Filter out stray files that are not valid HUDI data or log files
+            List<StoragePathInfo> validDataFiles = 
filterValidDataFiles(statuses);

Review Comment:
   Good suggestion @the-other-tim-brown - I've moved the filtering down to 
`FileSystemBackedTableMetadata.listPartitions()`. 
   
   This is now consistent with the other methods in that class 
(`getAllFilesInPartition()` and `getAllFilesInPartitions()`) which already use 
`FSUtils.getAllDataFilesInPartition()` for filtering. 
   
   The `HoodieBackedTableMetadata.listPartitions()` delegates to 
`getAllFilesInPartitions()` which reads from the metadata table, so it won't 
have stray files and doesn't need additional filtering.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to