yaoliclshlmch opened a new pull request, #18761:
URL: https://github.com/apache/hudi/pull/18761
### Summary
Add two Flink counter metrics for the in-memory `RecordIndexCache` used by
the bucket-assign operator on the global record-level index path:
- `bucketassign.minibatch.cache.hit.count` — number of record-key lookups
served from the in-memory cache.
- `bucketassign.minibatch.cache.miss.count` — number of record-key lookups
that fell back to a metadata-table read.
These metrics make it possible to monitor the effectiveness of the minibatch
RLI cache in production and to alert when the hit ratio degrades.
### Implementation
- New `FlinkBucketAssignMetrics` (mirrors the existing
`FlinkRocksDBIndexMetrics` pattern): owns two `SimpleCounter` instances and
exposes `markCacheHit() / markCacheHit(long)` and `markCacheMiss() /
markCacheMiss(long)` helpers. The `(long)` overloads short-circuit on `n <= 0`
so a single call site can drain a whole batch's hit/miss counts.
- `GlobalRecordLevelIndexBackend`:
- Override `registerMetrics(MetricGroup)` to construct the new metrics
class (re-registration guarded, matching `RocksDBIndexBackend`).
- In `get(List<String> recordKeys)` (which the single-key `get(String)`
also delegates to), after the existing cache lookup loop, bump the hit/miss
counters by `recordKeys.size() - missedKeys.size()` and `missedKeys.size()`
respectively. The increment is null-guarded so existing test harnesses that
never call `registerMetrics` are unaffected.
No new configuration is introduced; metrics are registered automatically by
the existing `BucketAssignFunction#initializeState` →
`indexBackend.registerMetrics(...)` lifecycle hook whenever the global RLI
backend is in use.
### Test plan
New + existing tests (46 tests, all green) with `-Pspark3.3,flink1.18`:
- `TestFlinkBucketAssignMetrics` (new, 4 tests):
- registers both counter names under the `MetricGroup`
- `markCacheHit / markCacheMiss` and their `(long)` overloads increment
the right counters
- zero/negative `n` is a no-op
- re-registration is a no-op and preserves the original counter instance
- `TestGlobalRecordLevelIndexBackend` (3 new + 2 existing):
- `registerMetrics` registers the two counters and is idempotent
- cache-hit / cache-miss counts increment correctly across batch lookups
and the single-key `get(String)` path
- calling `get(...)` without registering metrics does not throw
(null-guard)
- Regression: `TestMinibatchBucketAssignFunction`, `TestBucketAssigner`,
`TestRecordIndexCache`, `TestRocksDBIndexBackend`, `TestFlinkCompactionMetrics`
all still pass.
### Risk
Low. The change is observability-only: cache lookup semantics are unchanged,
no new public APIs, no new config keys, and the increment path is a
constant-time pair of counter bumps after the existing miss-collection loop.
Behavior in tests that don't register metrics is preserved by the null-guard.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]