KnightChess commented on code in PR #11219:
URL: https://github.com/apache/hudi/pull/11219#discussion_r1603663221


##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -2000,16 +2000,15 @@ public DirectoryInfo(String relativePath, 
List<StoragePathInfo> pathInfos, Strin
       // Pre-allocate with the maximum length possible
       filenameToSizeMap = new HashMap<>(pathInfos.size());
 
+      // Presence of partition meta file implies this is a HUDI partition
+      isHoodiePartition = pathInfos.stream().anyMatch(status -> 
status.getPath().getName().startsWith(HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX));

Review Comment:
   for partioned table, if the parent path is already a Hudi partition, is it 
still necessary to validate the partition metadata files of the subdirectories, 
can we use short-circuit condition?



##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -2000,16 +2000,15 @@ public DirectoryInfo(String relativePath, 
List<StoragePathInfo> pathInfos, Strin
       // Pre-allocate with the maximum length possible
       filenameToSizeMap = new HashMap<>(pathInfos.size());
 
+      // Presence of partition meta file implies this is a HUDI partition
+      isHoodiePartition = pathInfos.stream().anyMatch(status -> 
status.getPath().getName().startsWith(HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX));

Review Comment:
   > If you expose your Hudi table as a Delta Lake table with XTable, you will 
have parquet files in the _delta_log and this will lead to a parsing issue.
   
   A complete dataset includes files that both conform to and do not conform to 
the Hudi filename format. If the metadata table (MDT) only includes files that 
conform to Hudi's format, then some file data will be missing. It is not clear 
whether XTable has its own solution for maintaining the MDT. I think such 
handling should be maintained on the XTable side, not on the Hudi side. On the 
Hudi side, I think the MDT construction should fail and throw an exception, 
prompting the user to handle such anomalous files. what about you?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to