hudi-agent commented on code in PR #18813:
URL: https://github.com/apache/hudi/pull/18813#discussion_r3286287925


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/index/RecordLevelIndexBackend.java:
##########
@@ -161,15 +176,30 @@ private BucketCache bootstrapPartition(String 
partitionPath) {
       return cache;
     }
 
+    // Time the remote metadata-table reads (RLI bucket listing + record index 
location read) so
+    // operators can observe per-partition bootstrap latency. The local cache 
fill is excluded so
+    // the histogram tracks remote cost only, matching 
remoteIndexLookupLatency on the global RLI path.
+    metrics.startRemoteIndexLookup();
+    final HoodiePairData<String, HoodieRecordGlobalLocation> locations;

Review Comment:
   🤖 nit: the three `metrics.endRemoteIndexLookup()` calls (early-return, 
success, and catch) could be collapsed into a single `try { ... } finally { 
metrics.endRemoteIndexLookup(); }` around the metadata reads — easier to read 
and harder to miss a path when this changes later.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/index/RecordLevelIndexBackend.java:
##########
@@ -161,15 +176,30 @@ private BucketCache bootstrapPartition(String 
partitionPath) {
       return cache;
     }
 
+    // Time the remote metadata-table reads (RLI bucket listing + record index 
location read) so
+    // operators can observe per-partition bootstrap latency. The local cache 
fill is excluded so
+    // the histogram tracks remote cost only, matching 
remoteIndexLookupLatency on the global RLI path.
+    metrics.startRemoteIndexLookup();
+    final HoodiePairData<String, HoodieRecordGlobalLocation> locations;
     try {
       Map<String, List<FileSlice>> partitionedFileGroups =
           
metadataTable.getBucketizedFileGroupsForPartitionedRLI(MetadataPartitionType.RECORD_INDEX);
       List<FileSlice> fileSlices = partitionedFileGroups.get(partitionPath);
       if (fileSlices == null || fileSlices.isEmpty()) {
+        metrics.endRemoteIndexLookup();
+        metrics.updateRemoteLookupKeysCount(0L);
         return cache;
       }
-      HoodiePairData<String, HoodieRecordGlobalLocation> locations =
-          metadataTable.readRecordIndexLocations(fileSlicesToFilter -> 
fileSlices);
+      locations = metadataTable.readRecordIndexLocations(fileSlicesToFilter -> 
fileSlices);

Review Comment:
   🤖 `metadataTable.readRecordIndexLocations(fileSlicesFilter)` returns a lazy 
`HoodieListData` (see `HoodieBackedTableMetadata#readRecordIndexLocations` — 
`HoodieListData.lazy(new LazyConcatenatingIterator<>(...))`), so the actual 
file-slice scan happens inside the `forEach` on line 203, not in this call. 
Stopping the timer here means `remoteIndexLookupLatency` only captures the 
file-slice listing + setup cost and misses the bulk of the remote read — which 
contradicts the comment that says this matches the global RLI path. In 
`GlobalRecordLevelIndexBackend` the corresponding `endRemoteIndexLookup()` is 
called *after* the `forEach`. Could you move this `endRemoteIndexLookup()` to 
after the iteration loop (and adjust the catch path so the timer is always 
stopped exactly once)?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to