hudi-agent commented on code in PR #18790:
URL: https://github.com/apache/hudi/pull/18790#discussion_r3279002396
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/metrics/FlinkBucketAssignMetrics.java:
##########
@@ -38,6 +40,13 @@ public class FlinkBucketAssignMetrics extends
HoodieFlinkMetrics {
*/
private final Histogram recordBufferingTime;
+ /**
+ * Number of RLI file group shards assigned to this bucket assign task.
+ * Set once during open() when global RLI is active; remains -1 otherwise.
+ * Compare across task subtasks to detect skew in shard distribution.
+ */
+ private final AtomicInteger numShardsAssigned = new AtomicInteger(-1);
Review Comment:
🤖 nit: could you extract `-1` into a named constant (e.g. `private static
final int UNSET_SHARD_COUNT = -1`)? It's referenced in the Javadoc, here, and
in test assertions, so a constant would make the sentinel's meaning
self-evident without readers having to recall the magic value.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/GlobalRecordIndexPartitioner.java:
##########
@@ -39,6 +41,13 @@
* group and reduces small files in the metadata table.
*/
public class GlobalRecordIndexPartitioner implements Partitioner<HoodieKey> {
+ /**
+ * JVM-level cache to avoid repeatedly fetching the file group count from
the metadata table.
+ * Keyed by table path; shared across all instances (partitioner +
BucketAssignFunction) in the
Review Comment:
🤖 nit: it might be worth declaring the field as `ConcurrentMap<String,
Integer>` rather than the concrete `ConcurrentHashMap` — the interface type
makes the thread-safety contract visible to readers and leaves the
implementation easy to swap (e.g. a bounded cache) without touching the field
declaration.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]