nsivabalan commented on code in PR #18182:
URL: https://github.com/apache/hudi/pull/18182#discussion_r3370188868


##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -1549,8 +1558,16 @@ public static List<FileSlice> 
getPartitionLatestFileSlicesIncludingInflight(Hood
     HoodieTableFileSystemView fsView = null;
     try {
       fsView = fileSystemView.orElseGet(() -> 
getFileSystemViewForMetadataTable(metaClient));
-      Stream<FileSlice> fileSliceStream = 
fsView.getLatestFileSlicesIncludingInflight(partition);
-      return fileSliceStream
+      // Check if bucketing is enabled
+      boolean bucketingEnabled = isBucketingEnabledForMDT(metaClient);
+      List<String> partitionsToList = getPartitionsToList(metaClient, 
partition, bucketingEnabled);
+
+      List<FileSlice> fileSlices = new ArrayList<>();

Review Comment:
   same comment as above



##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -2294,6 +2311,98 @@ public static String 
deleteMetadataTablePartition(HoodieTableMetaClient dataMeta
     return null;
   }
 
+  /**
+   * Returns true if bucketing is enabled for the metadata table.
+   *
+   * <p>Reads the property from the MDT's own table config so we avoid an 
extra disk read of the
+   * data table's hoodie.properties on every file-slice lookup. The property 
is persisted on both
+   * the data table and the MDT during MDT initialization (see
+   * {@link #setMetadataTablePartitionBucketing(HoodieTableMetaClient, 
HoodieTableMetaClient, boolean)}).
+   *
+   * @param metaClient MetaClient for the metadata table
+   * @return true if bucketing is enabled
+   */
+  private static boolean isBucketingEnabledForMDT(HoodieTableMetaClient 
metaClient) {
+    return 
metaClient.getTableConfig().isMetadataTablePartitionBucketingEnabled();
+  }
+
+  /**
+   * Returns the list of partitions to list for file slices.
+   * When bucketing is enabled, returns the bucket sub-directories.
+   * When bucketing is disabled, returns the partition itself.
+   *
+   * <p>Note: bucketing is fixed at MDT initialization time (see
+   * {@link 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter#initializeFileGroups}).
 A given
+   * MDT is therefore either fully bucketed or fully non-bucketed; mixed-state 
file groups under
+   * partition/ alongside partition/&lt;bucket&gt;/ cannot occur. We 
deliberately do not also scan the
+   * partition root when bucketing is enabled.
+   *
+   * @param metaClient MetaClient for the metadata table
+   * @param partition The partition path
+   * @param bucketingEnabled Whether bucketing is enabled
+   * @return List of partition paths to list
+   */
+  private static List<String> getPartitionsToList(HoodieTableMetaClient 
metaClient, String partition, boolean bucketingEnabled) {
+    List<String> partitionsToList = new ArrayList<>();
+    try {
+      if (bucketingEnabled) {
+        // Find all the buckets in the partition
+        StoragePath partitionPath = new StoragePath(metaClient.getBasePath(), 
partition);
+        List<StoragePathInfo> statuses = 
metaClient.getStorage().listDirectEntries(partitionPath);
+        for (StoragePathInfo status : statuses) {
+          if (status.isDirectory()) {

Review Comment:
   should we also check for existence of the partition meta file in addition to 
`isDirectory`



##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -1514,22 +1515,30 @@ private static List<FileSlice> 
getPartitionFileSlices(HoodieTableMetaClient meta
     HoodieTableFileSystemView fsView = null;
     try {
       fsView = fileSystemView.orElseGet(() -> 
getFileSystemViewForMetadataTable(metaClient));
-      Stream<FileSlice> fileSliceStream;
-      if (mergeFileSlices) {
-        if 
(metaClient.getActiveTimeline().filterCompletedInstants().lastInstant().isPresent())
 {
-          fileSliceStream = fsView.getLatestMergedFileSlicesBeforeOrOn(
-              // including pending compaction instant as the last instant so 
that the finished delta commits
-              // that start earlier than the compaction can be queried.
-              partition, 
metaClient.getActiveTimeline().filterCompletedAndCompactionInstants().lastInstant().get().requestedTime());
+      // Check if bucketing is enabled
+      boolean bucketingEnabled = isBucketingEnabledForMDT(metaClient);
+      List<String> partitionsToList = getPartitionsToList(metaClient, 
partition, bucketingEnabled);
+
+      List<FileSlice> fileSlices = new ArrayList<>();

Review Comment:
   we are converting this to stream at L 1539. why not maintain it as stream 
only



##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -2294,6 +2311,98 @@ public static String 
deleteMetadataTablePartition(HoodieTableMetaClient dataMeta
     return null;
   }
 
+  /**
+   * Returns true if bucketing is enabled for the metadata table.
+   *
+   * <p>Reads the property from the MDT's own table config so we avoid an 
extra disk read of the
+   * data table's hoodie.properties on every file-slice lookup. The property 
is persisted on both
+   * the data table and the MDT during MDT initialization (see
+   * {@link #setMetadataTablePartitionBucketing(HoodieTableMetaClient, 
HoodieTableMetaClient, boolean)}).
+   *
+   * @param metaClient MetaClient for the metadata table
+   * @return true if bucketing is enabled
+   */
+  private static boolean isBucketingEnabledForMDT(HoodieTableMetaClient 
metaClient) {
+    return 
metaClient.getTableConfig().isMetadataTablePartitionBucketingEnabled();
+  }
+
+  /**
+   * Returns the list of partitions to list for file slices.
+   * When bucketing is enabled, returns the bucket sub-directories.
+   * When bucketing is disabled, returns the partition itself.
+   *
+   * <p>Note: bucketing is fixed at MDT initialization time (see
+   * {@link 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter#initializeFileGroups}).
 A given
+   * MDT is therefore either fully bucketed or fully non-bucketed; mixed-state 
file groups under
+   * partition/ alongside partition/&lt;bucket&gt;/ cannot occur. We 
deliberately do not also scan the
+   * partition root when bucketing is enabled.
+   *
+   * @param metaClient MetaClient for the metadata table
+   * @param partition The partition path
+   * @param bucketingEnabled Whether bucketing is enabled
+   * @return List of partition paths to list
+   */
+  private static List<String> getPartitionsToList(HoodieTableMetaClient 
metaClient, String partition, boolean bucketingEnabled) {
+    List<String> partitionsToList = new ArrayList<>();
+    try {
+      if (bucketingEnabled) {
+        // Find all the buckets in the partition
+        StoragePath partitionPath = new StoragePath(metaClient.getBasePath(), 
partition);
+        List<StoragePathInfo> statuses = 
metaClient.getStorage().listDirectEntries(partitionPath);
+        for (StoragePathInfo status : statuses) {
+          if (status.isDirectory()) {
+            partitionsToList.add(partition + StoragePath.SEPARATOR + 
status.getPath().getName());
+          }
+        }
+        log.info("Listing {} buckets in metadata table partition {}", 
partitionsToList.size(), partition);
+      } else {
+        // Only list the partition folder as bucketing is not enabled
+        partitionsToList.add(partition);
+        log.info("Listing non-bucketed metadata table partition {}", 
partition);
+      }
+    } catch (IOException e) {
+      throw new HoodieMetadataException("Failed to check for bucketing in 
metadata table partition " + partition, e);
+    }
+    return partitionsToList;
+  }
+
+  /**
+   * Set the bucketing of partitions within the metadata table.
+   *
+   * <p>The property is persisted on both the data table (authoritative source 
for MDT initialization
+   * decisions) and the MDT (to allow reader code to determine bucketing 
without loading data table
+   * properties on every call).
+   *
+   * @param dataMetaClient     MetaClient for the dataset
+   * @param metadataMetaClient MetaClient for the MDT (may be null if MDT is 
not yet initialized)
+   * @param enabled            If true, enable bucketing for MDT partitions; 
if false, disable it
+   */
+  public static HoodieTableMetaClient 
setMetadataTablePartitionBucketing(HoodieTableMetaClient dataMetaClient,
+                                                                         
HoodieTableMetaClient metadataMetaClient, boolean enabled) {
+    // Only allowed on the main dataset
+    
ValidationUtils.checkArgument(!isMetadataTable(dataMetaClient.getBasePath().toString()),
 "Bucketing should only be enabled on the main dataset");
+
+    // Persist on the MDT first, then on the data table. The data table is the 
authoritative source for the
+    // retry guard in HoodieBackedTableMetadataWriter#initializeFileGroups
+    // (bucketingEnabled && 
!dataMetaClient...isMetadataTablePartitionBucketingEnabled()). Updating the data
+    // table last ensures that if the MDT update fails the data table stays 
un-flipped, so the next retry
+    // naturally re-attempts both updates rather than leaving the two configs 
permanently desynced.
+    if (metadataMetaClient != null) {
+      // Persist on MDT so reader code can determine bucketing from MDT props 
alone.
+      
metadataMetaClient.getTableConfig().setMetadataTablePartitionBucketing(enabled);

Review Comment:
   we can't add adhoc properties to table properties. 
   
   lets think through a generic solution. 
   for eg,
   adding bucketing to all partitions in mdt. 
   or add a config property w/ value listing all mdt partitions w/ bucketing 
enabled. 
   by default it will be empty. 
   
   btw, do we even need a property added to data table for this purpose. 
   
   we can just add only to metadata table right?
   all readers and writers should have access to mdt table config when in need. 
   



##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -2294,6 +2311,98 @@ public static String 
deleteMetadataTablePartition(HoodieTableMetaClient dataMeta
     return null;
   }
 
+  /**
+   * Returns true if bucketing is enabled for the metadata table.
+   *
+   * <p>Reads the property from the MDT's own table config so we avoid an 
extra disk read of the
+   * data table's hoodie.properties on every file-slice lookup. The property 
is persisted on both
+   * the data table and the MDT during MDT initialization (see
+   * {@link #setMetadataTablePartitionBucketing(HoodieTableMetaClient, 
HoodieTableMetaClient, boolean)}).
+   *
+   * @param metaClient MetaClient for the metadata table
+   * @return true if bucketing is enabled
+   */
+  private static boolean isBucketingEnabledForMDT(HoodieTableMetaClient 
metaClient) {
+    return 
metaClient.getTableConfig().isMetadataTablePartitionBucketingEnabled();

Review Comment:
   these getters should be fixed once we fix the config property name



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1187,9 +1250,16 @@ private void initializeFileGroups(HoodieTableMetaClient 
dataMetaClient, Metadata
           writer.appendBlock(block);
         }
       } catch (InterruptedException e) {
-        throw new HoodieException(String.format("Failed to created fileGroup 
%s for partition %s", fileGroupFileId, relativePartitionPath), e);
+        throw new HoodieException(String.format("Failed to created fileGroup 
%s for partition %s", fileGroupFileId, relativePath), e);
       }
-    }, fileGroupFileIds.size());
+    }, fileGroupIdAndPathPairList.size());
+
+    if (bucketingEnabled && 
!dataMetaClient.getTableConfig().isMetadataTablePartitionBucketingEnabled()) {
+      // Bucketing has been enabled. Update the instance field so subsequent 
partition initializations
+      // in this writer see the refreshed table config. The parameter would 
otherwise shadow the field,
+      // leaving the writer with a stale view that bucketing is still disabled.
+      this.dataMetaClient = 
HoodieTableMetadataUtil.setMetadataTablePartitionBucketing(dataMetaClient, 
metadataMetaClient, bucketingEnabled);

Review Comment:
     ⚠️  setMetadataTablePartitionBucketing should be hoisted out of 
initializeFileGroups. Three concerns with the current placement:                
                                             
     
     1. Ordering hazard: file groups are written before the table property is 
persisted. A crash between the foreach and the property update leaves a 
bucketed on-disk layout with               
     bucketing.enable=false — readers won't discover the file groups.           
                                                                                
                               
     2. Mid-loop metaClient reload in initializeFromFilesystem invalidates any 
references captured earlier in the loop (e.g., 
lazyLatestMergedPartitionFileSliceList).                           
     3. Mixing concerns: bucketing is a once-per-MDT decision, but the setter 
is invoked inside per-partition-init code with a guard.                         
                                   
                                                                                
                                                                                
                                 
     Suggest hoisting to initializeFromFilesystem, immediately after 
initializeMetaClient(), so the property is set exactly once before any file 
groups land. initializeFileGroups then just     
     reads isMetadataTablePartitionBucketingEnabled() to pick layout.           
                                                                                
                                 
                                                                                
                                                                                
                                   
     Bottom line: yes, the current placement is suboptimal. Hoisting it to 
initializeFromFilesystem improves correctness (crash-safety ordering), 
simplicity (single decision point), and          
     robustness (no mid-loop metaClient reload).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to