yaoliclshlmch opened a new pull request, #18761:
URL: https://github.com/apache/hudi/pull/18761

   ### Summary
   Add two Flink counter metrics for the in-memory `RecordIndexCache` used by 
the bucket-assign operator on the global record-level index path:
   - `bucketassign.minibatch.cache.hit.count` — number of record-key lookups 
served from the in-memory cache.
   - `bucketassign.minibatch.cache.miss.count` — number of record-key lookups 
that fell back to a metadata-table read.
   These metrics make it possible to monitor the effectiveness of the minibatch 
RLI cache in production and to alert when the hit ratio degrades.
   
   ### Implementation
   - New `FlinkBucketAssignMetrics` (mirrors the existing 
`FlinkRocksDBIndexMetrics` pattern): owns two `SimpleCounter` instances and 
exposes `markCacheHit() / markCacheHit(long)` and `markCacheMiss() / 
markCacheMiss(long)` helpers. The `(long)` overloads short-circuit on `n <= 0` 
so a single call site can drain a whole batch's hit/miss counts.
   - `GlobalRecordLevelIndexBackend`:
     - Override `registerMetrics(MetricGroup)` to construct the new metrics 
class (re-registration guarded, matching `RocksDBIndexBackend`).
     - In `get(List<String> recordKeys)` (which the single-key `get(String)` 
also delegates to), after the existing cache lookup loop, bump the hit/miss 
counters by `recordKeys.size() - missedKeys.size()` and `missedKeys.size()` 
respectively. The increment is null-guarded so existing test harnesses that 
never call `registerMetrics` are unaffected.
   No new configuration is introduced; metrics are registered automatically by 
the existing `BucketAssignFunction#initializeState` → 
`indexBackend.registerMetrics(...)` lifecycle hook whenever the global RLI 
backend is in use.
   ### Test plan
   New + existing tests (46 tests, all green) with `-Pspark3.3,flink1.18`:
   - `TestFlinkBucketAssignMetrics` (new, 4 tests):
     - registers both counter names under the `MetricGroup`
     - `markCacheHit / markCacheMiss` and their `(long)` overloads increment 
the right counters
     - zero/negative `n` is a no-op
     - re-registration is a no-op and preserves the original counter instance
   - `TestGlobalRecordLevelIndexBackend` (3 new + 2 existing):
     - `registerMetrics` registers the two counters and is idempotent
     - cache-hit / cache-miss counts increment correctly across batch lookups 
and the single-key `get(String)` path
     - calling `get(...)` without registering metrics does not throw 
(null-guard)
   - Regression: `TestMinibatchBucketAssignFunction`, `TestBucketAssigner`, 
`TestRecordIndexCache`, `TestRocksDBIndexBackend`, `TestFlinkCompactionMetrics` 
all still pass.
   
   ### Risk
   Low. The change is observability-only: cache lookup semantics are unchanged, 
no new public APIs, no new config keys, and the increment path is a 
constant-time pair of counter bumps after the existing miss-collection loop. 
Behavior in tests that don't register metrics is preserved by the null-guard.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to