the-other-tim-brown commented on code in PR #11219:
URL: https://github.com/apache/hudi/pull/11219#discussion_r1603732565


##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -2000,16 +2000,15 @@ public DirectoryInfo(String relativePath, 
List<StoragePathInfo> pathInfos, Strin
       // Pre-allocate with the maximum length possible
       filenameToSizeMap = new HashMap<>(pathInfos.size());
 
+      // Presence of partition meta file implies this is a HUDI partition
+      isHoodiePartition = pathInfos.stream().anyMatch(status -> 
status.getPath().getName().startsWith(HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX));

Review Comment:
   I think that the Hudi bootstrap should only consider files that are managed 
by Hudi. Letting people do things with their Hudi tables is important in my 
opinion. This can include adding directories under a base path that are not 
managed by Hudi to store some metadata.
   
   The issue here is that if you had a Hudi table without MDT and then turn it 
on and you happen to have any parquet files that are not managed by Hudi then 
you will get an error even if those files are not in a data partition directory.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to