Thanks, Ryan! Yes — opentelemetry-api is the only thing on the runtime classpath, and the host owns the SDK and exporter, so there are no surprises for users who don't opt in.
Q1 (the compileOnly dependency) looks settled, and I've heard no objection to keeping it in iceberg-core for Q2 (module placement), so I'll proceed on that basis. JB, looking forward to your PR review whenever you get a chance. Nori On Thu, May 28, 2026 at 5:51 AM Ryan Blue <[email protected]> wrote: > Using `compileOnly` sounds right to me. Thanks for the explanation! > > On Tue, May 26, 2026 at 6:02 PM Noritaka Sekiyama via dev < > [email protected]> wrote: > >> Hi Grant, and all, >> >> Thanks for sharing the data point — cardinality from per-table attributes >> is exactly the kind of real-world failure mode the design should account >> for, and your experience is fair. >> >> I pushed commit 5d867e49d to #16250 that addresses this by making the >> attribute set configurable and giving users more control over cardinality. >> A new catalog property iceberg.otel.metrics.attributes accepts a >> comma-separated allowlist of attribute short names (table-name, schema-id, >> operation). Attributes whose short names are not listed are omitted from >> emitted metric points. The default attribute set is table-name and >> operation; schema-id is opt-in. Workloads with thousands of tables can >> flip table-name off and keep operation-level aggregates when it is >> preferred. >> >> For users who want to keep iceberg.table.name but only for a subset of >> tables, I've also filed #16573 to propose a framework-level table-name >> filter that would apply uniformly across all MetricsReporter >> implementations — complementary to the per-reporter attribute pruning >> above. This would also address your concern. >> >> On the span-based reporter suggestion: I took some time to think through >> whether it makes sense to layer that into this PR or as a sibling reporter >> alongside OtelMetricsReporter. I'd like to defer it, mainly because >> emitting OpenTelemetry spans through the MetricsReporter callback feels >> semantically off — MetricsReporter fires after the operation has >> finished, so the reporter would have to synthesize spans retroactively from >> the report's duration rather than open and close them at the real operation >> boundaries, and the class name MetricsReporter emitting traces is itself >> a friction point. The natural home for span-based observability is probably >> an Iceberg-side instrumentation hook in the scan planner / commit code >> paths that opens spans at the real boundaries, which is a larger design >> discussion that I'd want to handle as a separate Issue / PR rather than >> bolting onto this one. >> >> For #16250 specifically, my preference is to keep it as a metrics-only >> reporter with the control above. >> >> Thanks, >> Nori >> >> On Tue, May 26, 2026 at 1:14 AM Grant Nicholas < >> [email protected]> wrote: >> >>> +1 with OTEL implementation of MetricsReporter, but have you considered >>> a span-based implementation instead of/in addition to a metrics-based >>> implementation? >>> >>> High cardinality metrics should be avoided and (schema_name, >>> table_name) attributes can be high cardinality depending on your workload. >>> Spans do not have problems with high cardinality. >>> >>> For context, we built a metrics-based MetricsReporter, ran into high >>> cardinality cost issues with thousands of tables, then switched to a >>> span-based MetricsReporter. >>> >>> On Mon, May 25, 2026 at 2:08 AM Noritaka Sekiyama via dev < >>> [email protected]> wrote: >>> >>>> Hi JB, and all, >>>> >>>> Thanks for the suggestion. Pushed efc48d429 which adds an >>>> OtelMetricsReporter section to docs/docs/metrics-reporting.md. It documents >>>> the host's responsibility for packaging the OpenTelemetry API, SDK, and a >>>> metric exporter (Gradle plus a spark-submit --packages example), the >>>> programmatic SDK registration path, exporter-wiring examples for the >>>> OpenTelemetry Collector, Prometheus (pull and push), and Amazon CloudWatch >>>> via the sigv4auth Collector extension, plus the emitted metric names and >>>> attribute set. >>>> >>>> Verified end-to-end against the Prometheus pull pattern from the docs >>>> (host SDK with PrometheusHttpServer + OtelMetricsReporter reporting >>>> synthetic ScanReport/CommitReport, all 12 iceberg.* series visible on >>>> /metrics with the documented attribute set); each Collector YAML in the >>>> docs was otelcol-contrib validate-checked. >>>> >>>> Looking forward to your PR review. >>>> >>>> Thanks, >>>> Nori >>>> >>>> On Mon, May 25, 2026 at 3:00 PM Jean-Baptiste Onofré <[email protected]> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I think this is a great proposal. >>>>> >>>>> I would suggest documenting how to package the exporter, as I believe >>>>> it is up to the user to package the specific OpenTelemetry exporter they >>>>> need. >>>>> >>>>> I will take a look at the PR. >>>>> >>>>> Regards, >>>>> JB >>>>> >>>>> On Thu, May 21, 2026 at 3:39 AM Noritaka Sekiyama via dev < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I'd like to propose adding an OpenTelemetry-based MetricsReporter to >>>>>> iceberg-core that exports ScanReport and CommitReport to any >>>>>> OTLP-compatible >>>>>> backend. >>>>>> >>>>>> # Background >>>>>> Iceberg ships three built-in MetricsReporter implementations today: >>>>>> LoggingMetricsReporter, InMemoryMetricsReporter (Spark-internal), and >>>>>> RESTMetricsReporter (REST catalog only). >>>>>> None of them give users an out-of-the-box way to ship scan/commit >>>>>> metrics to an external observability platform. >>>>>> The gap applies to Spark users on non-REST catalogs and to all >>>>>> non-Spark engines (Trino, Flink, etc.). >>>>>> >>>>>> # Motivation >>>>>> OpenTelemetry is the vendor-neutral CNCF standard for telemetry, >>>>>> supported by every major observability backend (Prometheus, CloudWatch, >>>>>> Datadog, Grafana Cloud, etc.). >>>>>> A single OTLP-based MetricsReporter in Iceberg lets users reach all >>>>>> of these without per-vendor integrations in the project. >>>>>> This is complementary to #14360, which adds OTel support to HTTPClient >>>>>> at the REST-catalog HTTP layer; this proposal covers the >>>>>> Iceberg-level ScanReport / CommitReport layer. >>>>>> >>>>>> # Proposal >>>>>> Issue: https://github.com/apache/iceberg/issues/16169 >>>>>> PR: https://github.com/apache/iceberg/pull/16250 >>>>>> >>>>>> The reporter follows the same SDK-ownership philosophy as #14360 - >>>>>> the host application (Spark/Flink/Trino/...) registers an >>>>>> OpenTelemetrySdk >>>>>> via GlobalOpenTelemetry, and the reporter just looks up a Meter from it. >>>>>> The reporter has zero Iceberg-specific catalog properties; everything >>>>>> else is owned by the host. >>>>>> >>>>>> The PR has been validated end-to-end against two unrelated OTLP >>>>>> backends (Databricks Zerobus and Amazon CloudWatch) - full procedures and >>>>>> queries are linked from the PR. >>>>>> >>>>>> # On dependencies >>>>>> Given the current sensitivity around new runtime dependencies in >>>>>> 1.11, the PR adds only opentelemetry-api to iceberg-core as compileOnly. >>>>>> The OpenTelemetry SDK and OTLP exporters are not added to the runtime >>>>>> classpath >>>>>> - they come from the host application. >>>>>> opentelemetry-sdk / -sdk-testing are testImplementation only. >>>>>> >>>>>> # Questions for the community >>>>>> >>>>>> Q1. Any objection to taking the opentelemetry-api compileOnly >>>>>> dependency in iceberg-core? >>>>>> Q2. Module placement: iceberg-core (current PR), or a >>>>>> separate iceberg-opentelemetry module? >>>>>> >>>>>> Thanks, >>>>>> Noritaka Sekiyama, Databricks >>>>>> >>>>>
