cshuo commented on code in PR #18790:
URL: https://github.com/apache/hudi/pull/18790#discussion_r3295826115


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/GlobalRecordIndexPartitioner.java:
##########
@@ -74,19 +83,44 @@ public int partition(HoodieKey recordKey, int 
numPartitions) {
     return fgIndex % numPartitions;
   }
 
+  /**
+   * Returns the number of RLI shards (file group indices in [0, 
numFileGroups)) assigned to the given task.
+   *
+   * <p>The assignment follows the same modulo logic used in {@link 
#partition}: shard {@code fgIndex}
+   * is owned by task {@code fgIndex % numPartitions}. The count is {@code 
numFileGroups / numPartitions},
+   * plus one for tasks whose index is less than {@code numFileGroups % 
numPartitions}.
+   */
+  public static int computeNumShardsAssigned(int taskIndex, int numPartitions, 
int numFileGroups) {
+    int base = numFileGroups / numPartitions;
+    int remainder = numFileGroups % numPartitions;
+    return taskIndex < remainder ? base + 1 : base;
+  }
+
   /**
    * Get the number of file groups for record index partition in metadata 
table.
    */
   private int getNumFileGroupsForRecordIndexPartition() {
-    HoodieTableMetaClient metaClient = StreamerUtil.createMetaClient(conf);
-    try (HoodieTableMetadata metadataTable = 
metaClient.getTableFormat().getMetadataFactory().create(
-        HoodieFlinkEngineContext.DEFAULT,
-        metaClient.getStorage(),
-        StreamerUtil.metadataConfig(conf),
-        conf.get(FlinkOptions.PATH))) {
-      return 
metadataTable.getNumFileGroupsForPartition(MetadataPartitionType.RECORD_INDEX);
-    } catch (Exception e) {
-      throw new HoodieException("Failed to get file group count for global 
record index partition.", e);
-    }
+    return fetchNumFileGroupsForRecordIndexPartition(conf);
+  }
+
+  /**
+   * Reads the file group count for the record index partition from the 
metadata table.
+   * Results are cached per table path within the JVM so that multiple callers
+   * (e.g. the partitioner and {@link BucketAssignFunction}) share a single 
lookup.
+   */
+  static int fetchNumFileGroupsForRecordIndexPartition(Configuration conf) {
+    String tablePath = conf.get(FlinkOptions.PATH);
+    return NUM_FILE_GROUPS_CACHE.computeIfAbsent(tablePath, path -> {
+      HoodieTableMetaClient metaClient = StreamerUtil.createMetaClient(conf);
+      try (HoodieTableMetadata metadataTable = 
metaClient.getTableFormat().getMetadataFactory().create(
+          HoodieFlinkEngineContext.DEFAULT,
+          metaClient.getStorage(),
+          StreamerUtil.metadataConfig(conf),
+          conf.get(FlinkOptions.PATH))) {

Review Comment:
   nit: use `tablePath` here directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to