prashantwason commented on code in PR #18182:
URL: https://github.com/apache/hudi/pull/18182#discussion_r3230492213


##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -2281,6 +2298,78 @@ public static String 
deleteMetadataTablePartition(HoodieTableMetaClient dataMeta
     return null;
   }
 
+  /**
+   * Returns true if bucketing is enabled for the metadata table.
+   *
+   * @param metaClient MetaClient for the metadata table
+   * @return true if bucketing is enabled
+   */
+  private static boolean isBucketingEnabledForMDT(HoodieTableMetaClient 
metaClient) {
+    // Bucketing status is saved within the main dataset's properties.
+    String basePath = 
HoodieTableMetadata.getDataTableBasePathFromMetadataTable(metaClient.getBasePath().toString());
+    return HoodieTableMetaClient.builder()
+        .setBasePath(basePath)
+        .setConf(metaClient.getStorageConf().newInstance())
+        .setLoadActiveTimelineOnLoad(false)
+        .build()
+        .getTableConfig()
+        .isMetadataTablePartitionBucketingEnabled();
+  }
+
+  /**
+   * Returns the list of partitions to list for file slices.
+   * When bucketing is enabled, returns the bucket sub-directories.
+   * When bucketing is disabled, returns the partition itself.
+   *
+   * @param metaClient MetaClient for the metadata table
+   * @param partition The partition path
+   * @param bucketingEnabled Whether bucketing is enabled
+   * @return List of partition paths to list
+   */
+  private static List<String> getPartitionsToList(HoodieTableMetaClient 
metaClient, String partition, boolean bucketingEnabled) {
+    List<String> partitionsToList = new ArrayList<>();
+    try {
+      if (bucketingEnabled) {
+        // Find all the buckets in the partition
+        StoragePath partitionPath = new StoragePath(metaClient.getBasePath(), 
partition);

Review Comment:
   Acknowledged, but deferring within this PR. `listDirectEntries` is 
structurally needed to enumerate which buckets exist for a partition, since the 
bucket count is per-partition (= ceil(fileGroupCount / bucketSize)) and 
`fileGroupCount` is not currently tracked per partition in MDT table config. 
Eliminating the listing cleanly requires persisting per-partition bucket counts 
at MDT init time, which is an invasive change worth tracking separately. The 
cost in the current implementation is one `listDirectEntries` per partition per 
call (not per file slice), which is bounded.



##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java:
##########
@@ -628,6 +628,16 @@ public long getMaxLogFileSize() {
     return getLong(MAX_LOG_FILE_SIZE_BYTES_PROP);
   }
 
+  public static final ConfigProperty<Boolean> FILE_GROUP_BUCKETING_ENABLE = 
ConfigProperty
+          .key(METADATA_PREFIX + ".file.group.bucketing.enable")
+          .defaultValue(false)
+          .withDocumentation("This flag indicates whether there should be 
intermediate partitions under partitions such as record index and filelisting");

Review Comment:
   Added `checkArgument(bucketSize > 0, ...)` in `getFileGroupBucketSize()` in 
39755a7e so a misconfiguration fails fast rather than producing an 
`ArithmeticException` deep inside MDT init.



##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/functional/TestHoodieBackedMetadata.java:
##########
@@ -1458,11 +1460,10 @@ private void 
verifyMetadataRecordKeyExcludeFromPayloadLogFiles(HoodieTable table
    * these records should have additional meta fields in the payload. When key 
deduplication
    * is enabled, these records on the disk should have key in the payload as 
empty string.
    *
-   * @param table
    * @param logFiles         - Metadata table log files to be verified
    * @param enableMetaFields - Enable meta fields for records
    */
-  private static void verifyMetadataRawRecords(HoodieTable table, 
List<HoodieLogFile> logFiles, boolean enableMetaFields) throws IOException {
+private static void verifyMetadataRawRecords(HoodieTable table, 
List<HoodieLogFile> logFiles, boolean enableMetaFields) throws IOException {
     HoodieStorage storage = table.getStorage();

Review Comment:
   Confirmed real compile error (signature was 3-arg with `HoodieTable table`, 
two of three call sites passed only 2 args). Took Greptile's clean-fix path in 
39755a7e: changed the signature to take `HoodieStorage storage` instead of 
`HoodieTable table`, removed the now-unused `HoodieSparkTable.create(...)` at 
the third call site, and updated all three call sites to pass `storage` (which 
is already in scope via the metaClient or as a parameter). All three callers 
are now consistent.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1039,17 +1039,58 @@ private List<DirectoryInfo> 
listAllPartitionsFromMDT(String initializationTime,
   }
 
   /**
-   * Initialize file groups for a partition. For file listing, we just have 
one file group.
-   * <p>
-   * All FileGroups for a given metadata partition has a fixed prefix as per 
the {@link MetadataPartitionType#getFileIdPrefix()}.
-   * Each file group is suffixed with 4 digits with increments of 1 starting 
with 0000.
-   * <p>
-   * Let's say we configure 10 file groups for record level index partition, 
and prefix as "record-index-bucket-"
-   * File groups will be named as :
-   * record-index-bucket-0000, .... -> ..., record-index-bucket-0009
+   * Initialize file groups for a MDT partition.
+   *
+   * All FileGroups for a given metadata partition have a fixed prefix as per 
the {@link MetadataPartitionType#getFileIdPrefix()}.
+   * Each file group is suffixed with 4 digits with increments of 1 starting 
with 0000. To limit the number of files that
+   * are created in a single folder, the file groups can be arranged in 
buckets.
+   *
+   * ==== When file group bucketing is disabled
+   * Let's say we configure 2000 file groups for record index partition. File 
groups will be created as follows inside the
+   * record_index partition directory:
+   *    record_index/
+   *    record_index/.hoodie_partition_metadata
+   *    record_index/record-index-0000
+   *    record_index/record-index-0001
+   *    ...
+   *    record_index/record-index-1999
+   *
+   *  So 1000 file groups are created in the metadata partition directory.
+   *
+   * ==== When file group bucketing is enabled
+   * Let's say we configure 2000 file groups for record index partition. File 
groups will be created as follows in 2
+   * buckets (0000 and 0001) in the record_index partition directory.:
+   *    record_index/
+   *    record_index/000/.hoodie_partition_metadata
+   *    record_index/0000/record-index-0000

Review Comment:
   Fixed both in 39755a7e: "1000 file groups" -> "2000 file groups" (matches 
the example) and `record_index/000/` -> `record_index/0000/` (4 digits, 
consistent with the surrounding entries).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to