hudi-agent commented on code in PR #18813:
URL: https://github.com/apache/hudi/pull/18813#discussion_r3286287925
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/index/RecordLevelIndexBackend.java:
##########
@@ -161,15 +176,30 @@ private BucketCache bootstrapPartition(String
partitionPath) {
return cache;
}
+ // Time the remote metadata-table reads (RLI bucket listing + record index
location read) so
+ // operators can observe per-partition bootstrap latency. The local cache
fill is excluded so
+ // the histogram tracks remote cost only, matching
remoteIndexLookupLatency on the global RLI path.
+ metrics.startRemoteIndexLookup();
+ final HoodiePairData<String, HoodieRecordGlobalLocation> locations;
Review Comment:
🤖 nit: the three `metrics.endRemoteIndexLookup()` calls (early-return,
success, and catch) could be collapsed into a single `try { ... } finally {
metrics.endRemoteIndexLookup(); }` around the metadata reads — easier to read
and harder to miss a path when this changes later.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/index/RecordLevelIndexBackend.java:
##########
@@ -161,15 +176,30 @@ private BucketCache bootstrapPartition(String
partitionPath) {
return cache;
}
+ // Time the remote metadata-table reads (RLI bucket listing + record index
location read) so
+ // operators can observe per-partition bootstrap latency. The local cache
fill is excluded so
+ // the histogram tracks remote cost only, matching
remoteIndexLookupLatency on the global RLI path.
+ metrics.startRemoteIndexLookup();
+ final HoodiePairData<String, HoodieRecordGlobalLocation> locations;
try {
Map<String, List<FileSlice>> partitionedFileGroups =
metadataTable.getBucketizedFileGroupsForPartitionedRLI(MetadataPartitionType.RECORD_INDEX);
List<FileSlice> fileSlices = partitionedFileGroups.get(partitionPath);
if (fileSlices == null || fileSlices.isEmpty()) {
+ metrics.endRemoteIndexLookup();
+ metrics.updateRemoteLookupKeysCount(0L);
return cache;
}
- HoodiePairData<String, HoodieRecordGlobalLocation> locations =
- metadataTable.readRecordIndexLocations(fileSlicesToFilter ->
fileSlices);
+ locations = metadataTable.readRecordIndexLocations(fileSlicesToFilter ->
fileSlices);
Review Comment:
🤖 `metadataTable.readRecordIndexLocations(fileSlicesFilter)` returns a lazy
`HoodieListData` (see `HoodieBackedTableMetadata#readRecordIndexLocations` —
`HoodieListData.lazy(new LazyConcatenatingIterator<>(...))`), so the actual
file-slice scan happens inside the `forEach` on line 203, not in this call.
Stopping the timer here means `remoteIndexLookupLatency` only captures the
file-slice listing + setup cost and misses the bulk of the remote read — which
contradicts the comment that says this matches the global RLI path. In
`GlobalRecordLevelIndexBackend` the corresponding `endRemoteIndexLookup()` is
called *after* the `forEach`. Could you move this `endRemoteIndexLookup()` to
after the iteration loop (and adjust the catch path so the timer is always
stopped exactly once)?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]