cshuo commented on code in PR #18790:
URL: https://github.com/apache/hudi/pull/18790#discussion_r3295824173


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/GlobalRecordIndexPartitioner.java:
##########
@@ -74,19 +83,44 @@ public int partition(HoodieKey recordKey, int 
numPartitions) {
     return fgIndex % numPartitions;
   }
 
+  /**
+   * Returns the number of RLI shards (file group indices in [0, 
numFileGroups)) assigned to the given task.
+   *
+   * <p>The assignment follows the same modulo logic used in {@link 
#partition}: shard {@code fgIndex}
+   * is owned by task {@code fgIndex % numPartitions}. The count is {@code 
numFileGroups / numPartitions},
+   * plus one for tasks whose index is less than {@code numFileGroups % 
numPartitions}.
+   */
+  public static int computeNumShardsAssigned(int taskIndex, int numPartitions, 
int numFileGroups) {
+    int base = numFileGroups / numPartitions;
+    int remainder = numFileGroups % numPartitions;
+    return taskIndex < remainder ? base + 1 : base;
+  }
+
   /**
    * Get the number of file groups for record index partition in metadata 
table.
    */
   private int getNumFileGroupsForRecordIndexPartition() {

Review Comment:
   Seems we can keep only one of the two methods 
`getNumFileGroupsForRecordIndexPartition` and 
`fetchNumFileGroupsForRecordIndexPartition`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to