nsivabalan commented on code in PR #18047:
URL: https://github.com/apache/hudi/pull/18047#discussion_r2748192831


##########
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##########
@@ -422,11 +422,13 @@ private void ensurePartitionsLoadedCorrectly(List<String> 
partitionList) {
           LOG.debug("Time taken to list partitions {} ={}", partitionSet, 
(endLsTs - beginLsTs));
           pathInfoMap.forEach((partitionPair, statuses) -> {
             String relativePartitionStr = partitionPair.getLeft();
-            List<HoodieFileGroup> groups = 
addFilesToView(relativePartitionStr, statuses);
+            // Filter out stray files that are not valid HUDI data or log files
+            List<StoragePathInfo> validDataFiles = 
filterValidDataFiles(statuses);

Review Comment:
   this might break onetable sync. 
   @the-other-tim-brown @vinishjail97 : Is there a table config we write, when 
a hudi table is created via xtable? if not, I don't think we can add this 
validation here right. 



##########
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##########
@@ -422,11 +422,13 @@ private void ensurePartitionsLoadedCorrectly(List<String> 
partitionList) {
           LOG.debug("Time taken to list partitions {} ={}", partitionSet, 
(endLsTs - beginLsTs));
           pathInfoMap.forEach((partitionPair, statuses) -> {
             String relativePartitionStr = partitionPair.getLeft();
-            List<HoodieFileGroup> groups = 
addFilesToView(relativePartitionStr, statuses);
+            // Filter out stray files that are not valid HUDI data or log files
+            List<StoragePathInfo> validDataFiles = 
filterValidDataFiles(statuses);

Review Comment:
   this might break xtable sync. 
   @the-other-tim-brown @vinishjail97 : Is there a table config we write, when 
a hudi table is created via xtable? if not, I don't think we can add this 
validation here right. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to