skywalker0618 commented on code in PR #18813:
URL: https://github.com/apache/hudi/pull/18813#discussion_r3307673508


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/index/RecordLevelIndexBackend.java:
##########
@@ -35,14 +35,18 @@
 import org.apache.hudi.exception.HoodieIOException;
 import org.apache.hudi.metadata.HoodieTableMetadata;
 import org.apache.hudi.metadata.MetadataPartitionType;
+import org.apache.hudi.metrics.FlinkIndexBackendMetrics;
 import org.apache.hudi.sink.event.Correspondent;
 import org.apache.hudi.sink.utils.SamplingActionExecutor;
 import org.apache.hudi.util.FlinkWriteClients;
 import org.apache.hudi.util.StreamerUtil;
 
+import lombok.AccessLevel;
 import lombok.Getter;
 import lombok.extern.slf4j.Slf4j;
 import org.apache.flink.configuration.Configuration;
+import org.apache.flink.metrics.MetricGroup;
+import org.apache.flink.metrics.groups.UnregisteredMetricsGroup;

Review Comment:
   Makes sense, since bootstrap is the only remote operation in this backend, 
the remote_lookup_* metrics from FlinkIndexBackendMetrics are misleading here. 
Instead of reusing that class, I'm thinking we create a new metric class, say 
FlinkPartitionedIndexBackendMetrics, that records two metrics:
   
   1. partition_bootstrap_latency_millis (Histogram): distribution of 
per-bootstrap latency on this subtask. count = total bootstraps.
   2. partition_bootstrap_keys_loaded (Histogram): distribution of 
per-bootstrap key counts on this subtask. count = total bootstraps.
   
   This keeps the bootstrap semantics distinct from the lookup-centric metrics 
used by GlobalRecordLevelIndexBackend. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to