2dmurali opened a new pull request, #16590:
URL: https://github.com/apache/iceberg/pull/16590
Closes #16589
## Summary
- Wire `metricsReporter()` into `BaseIncrementalScan.planFiles()`, bringing
incremental scans to parity with batch scans (`SnapshotScan`) for metrics
reporting.
- Expose all `ScanMetricsResult` fields as Flink gauges on
`ContinuousIcebergEnumerator`, reporting per-scan (last-value) snapshots via
the coordinator metric
group.
- Accessible through Flink metric reporters (Prometheus, Datadog, JMX,
Slf4j) at path:
`coordinator.enumerator.IcebergSourceEnumerator.table.<tableName>.<metric>`
## Changes
**Core:**
- `BaseIncrementalScan.planFiles()` now emits `ScanReport` via
`metricsReporter()` (same pattern as `SnapshotScan`)
**Flink (v1.20, v2.0, v2.1):**
- New `IcebergSourceEnumeratorMetrics` class — 17 AtomicLong-backed gauges
- `ContinuousSplitPlannerImpl` creates `InMemoryMetricsReporter` per scan
cycle, attaches `ScanReport` to `ContinuousEnumerationResult`
- `ContinuousIcebergEnumerator` updates metrics after each scan discovery
- `FlinkSplitPlanner` supports optional `MetricsReporter` parameter
## Design decisions
- **Gauges (not counters):** Per-scan snapshots match Spark's reporting
model and let operators see spikes directly without applying `rate()`.
- **AtomicLong:** Gauges are read by metric reporter threads, written by
coordinator thread.
- **InMemoryMetricsReporter per cycle:** Lightweight, no accumulation,
follows existing Spark pattern.
## Testing
- Unit tests: `TestIcebergSourceEnumeratorMetrics` (all 17 gauges, null
handling, last-value semantics)
- Wiring test:
`TestContinuousIcebergEnumerator.testEnumeratorMetricsUpdatedFromScanReport`
- Integration test: `TestIcebergSourceContinuous` (MiniCluster +
InMemoryReporter)
- Core test: `TestIncrementalScanPlanningAndReporting`
- Verified end-to-end on standalone Flink 2.0 cluster
## Future work
- Batch metrics (`StaticIcebergEnumerator`) — requires different design due
to serialization boundary, planned as follow-up.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]