moomindani opened a new pull request, #16250:
URL: https://github.com/apache/iceberg/pull/16250
Closes #16169.
Adds `OtelMetricsReporter`, a vendor-neutral `MetricsReporter` that exports
Iceberg `ScanReport` and `CommitReport` via OpenTelemetry to any
OTLP-compatible backend (Prometheus, CloudWatch, Datadog, Grafana Cloud,
Honeycomb, etc.).
## Design
The reporter does **not** own the OpenTelemetry SDK. It obtains the
`OpenTelemetry` instance from `GlobalOpenTelemetry.get()`, which the host
application (Spark, Flink, Trino, ...) is expected to register via
`OpenTelemetrySdk.builder()...buildAndRegisterGlobal()` or via the
OpenTelemetry Java agent. If no SDK has been registered, OpenTelemetry returns
a no-op implementation and metric calls are silently dropped.
This mirrors the SDK-ownership philosophy established in #14360
(OpenTelemetry support in `HTTPClient`). The two PRs are complementary: #14360
instruments REST-catalog HTTP calls, this PR instruments Iceberg-level
scan/commit reports.
## Configuration
A single catalog property registers the reporter:
```
metrics-reporter-impl=org.apache.iceberg.metrics.OtelMetricsReporter
```
Endpoint, exporter, headers, resource attributes, and exporter intervals are
configured by the host application or via the standard OpenTelemetry
environment variables (`OTEL_EXPORTER_OTLP_ENDPOINT`, `OTEL_SERVICE_NAME`,
`OTEL_EXPORTER_OTLP_HEADERS`, ...).
## Metric mapping
- `iceberg.scan.planning.duration` (histogram, ms)
- `iceberg.scan.result.{data_files,delete_files}` (sum)
- `iceberg.scan.data_manifests.{scanned,skipped}` (sum)
- `iceberg.scan.file_size.bytes` (sum, By)
- `iceberg.commit.duration` (histogram, ms)
- `iceberg.commit.{attempts,records.added}` (sum)
- `iceberg.commit.data_files.{added,removed}` (sum)
- `iceberg.commit.file_size.added_bytes` (sum, By)
Attributes: `iceberg.table.name`, `iceberg.snapshot.id`,
`iceberg.schema.id`, `iceberg.operation`.
## Dependencies
Only `io.opentelemetry:opentelemetry-api` is added to `iceberg-core`,
declared as `compileOnly`. **The OpenTelemetry SDK and OTLP exporters are not
added to the runtime classpath** — they come from the host application. Test
scope adds `opentelemetry-sdk` and `opentelemetry-sdk-testing` for
`InMemoryMetricReader`-based unit tests, plus `opentelemetry-exporter-otlp` for
the gated end-to-end smoke test.
## Validation
Validated end-to-end against two completely different OTLP backends, using
the same reporter class without modification:
1. **Databricks Zerobus Ingest** (OTLP/gRPC, Bearer auth) — metrics land
directly in a Unity Catalog Delta table; verified with SQL aggregations
matching injected values exactly.
2. **Amazon CloudWatch** (OTLP/HTTP, SigV4 via OTel Collector) — same
reporter, same metric names, same attributes; verified via PromQL `sum by()`
and ratio queries.
In both cases the host process built and registered an `OpenTelemetrySdk`
(with the appropriate exporter and headers) before initializing Iceberg's
reporter.
## Disclosure
Per the project's [AI-assisted contribution
guidelines](https://iceberg.apache.org/contribute/#guidelines-for-ai-assisted-contributions),
I used Claude Code to help draft and prototype this work. I reviewed every
change by hand and ran the full test/lint loop locally before each iteration;
the validation results above are from my own runs against real backends. The
design discussion happened in #16169.
cc @ebyhr @singhpk234 @jbonofre — happy to address any feedback.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]