Hi folks, Thanks again for the discussion today. I updated the sync doc with notes from today's metrics sync:
https://docs.google.com/document/d/100h7c4damrUzVuquYbBHM0EvA4LSWuW2IT2dN_7nYVA/edit?tab=t.ezk0rgdx0c6m A few highlights: - No objection to keeping the default metrics behavior small: no-op/log-only is enough as the built-in default. - Durable storage, event forwarding, external queues, dashboards, and custom filtering should be implementation choices behind the metrics reporting path. - For PR #4115, the REST layer should resolve/authz/accept the report, then delegate to the selected metrics implementation. It should not own durable storage semantics. - The rough boundary we discussed is: spec owns the wire contract, runtime owns framework wiring, API/contract modules own provider-facing contracts, and extensions own replaceable implementations. - OpenLineage has a similar path-naming question. We should avoid occupying a generic lineage path too early if we want room for other lineage systems later. I believe the boundary rules previewed in sync meeting are useful for reviewing PR #4115, but it is not a final module-layout proposal. For metrics specifically, I think the target remains: runtime -> metrics reporting/emitting contract -> default implementation -> lower implementation details, if needed That keeps the default simple, while allowing durable JDBC, event-backed, or external-queue implementations to be added separately. We also touched on the split between Iceberg metrics emitting and Polaris-owned metrics querying. The querying side can keep evolving, especially for dashboard or generic-table use cases, but I do not think that needs to block the intake path in this PR. Thanks, -ej On Tue, Jun 16, 2026 at 2:17 PM Yufei Gu <[email protected]> wrote: > Thanks for chiming in, EJ. Agreed that the default battery should stay > small. I'm leaning toward not including metrics persistence in the default > battery. > > Yufei > > > On Thu, Jun 11, 2026 at 8:30 PM EJ Wang <[email protected]> > wrote: > > > Hi Yufei, > > > > Thanks for connecting this back to the REST endpoint proposal. I replied > > with the fuller version on the event-forwarding thread, but wanted to add > > the shorter version here too. > > > > I agree that metrics reporting/emitting is the right conceptual boundary, > > and I also think the event/listener path is a good implementation > direction > > to explore. The distinction I want to keep clear is default battery vs > > extension implementation. > > > > For the REST/API side, I would keep the semantics narrow: Polaris > resolves > > and authorizes the table, accepts the Iceberg scan/commit metrics report > > into the configured reporting/emitting path, and returns 204. That 204 > > should mean Polaris accepted the report into the ingestion path; it > should > > not imply durable storage. > > > > Durable storage, event forwarding, external telemetry routing, filtering, > > and retention should sit behind that reporting/emitting boundary as > > implementation-layer behavior. A durable JDBC path can be one named > > extension implementation. An event/listener forwarding path can be > another > > named extension implementation. > > > > So I am +1 on the event-forwarding idea as a non-default extension > > implementation. I just would not make it the default battery or the core > > API shape. The default battery can stay small and safe, while deployments > > that need forwarding, dashboards, or durable storage can opt into the > > implementation that matches their operational model. > > > > Thanks, > > -ej > > > > On Wed, Jun 10, 2026 at 11:43 AM Yufei Gu <[email protected]> wrote: > > > > > Thanks EJ. I agree that the reporting/emitting boundary feels like the > > > right SPI boundary, but I wonder if we can simplify this even further. > > > > > > Iceberg scan and commit metrics look very similar to events to me. They > > are > > > append only, asynchronously consumed, often forwarded to external > > systems, > > > and have similar retention and cleanup requirements. More details can > be > > > found in this thread [1]. In that case, Polaris may only need to accept > > the > > > report and emit an event through the existing event framework. > Filtering, > > > async delivery, custom sinks, and retention mechanisms could then be > > shared > > > instead of introducing a separate metrics specific extension layer and > > SPI. > > > > > > The main requirement I still see is querying metrics through Polaris. > If > > we > > > want that battery included experience, we could work on a persistence > > > listener implementation that selectively saves metrics and expose it > via > > > the IRC event endpoint which seems pretty close in Iceberg > > > community. Another option I like more is to allow users to develop > their > > > own listener, so they can persist scan/commit metrics to any systems > they > > > prefer. In general, that feels like an implementation choice on top of > > the > > > event framework rather than something that needs to shape the core > > > architecture. > > > > > > 1. https://lists.apache.org/thread/x9j8nscvy8hq61tyn01mj8yp6n9of0kp > > > > > > Thanks, > > > Yufei > > > > > > > > > On Thu, Jun 4, 2026 at 5:14 PM EJ Wang <[email protected] > > > > > wrote: > > > > > > > Good point. I agree the existing PolarisMetricsReporter is already > very > > > > close to the right conceptual boundary. I am not proposing a second > > > > parallel reporter concept. The distinction I am trying to make is > > > narrower: > > > > > > > > Current state: > > > > > > > > - PolarisMetricsReporter is the existing metrics reporting hook. > > > > - But it currently lives under runtime/service. > > > > - Its contract is documented around Quarkus/CDI discovery and > > > > @Identifier > > > > selection. > > > > - The default implementation is log-only. > > > > - The durable implementation writes through PolarisMetricsManager > > and > > > > MetricsPersistence. > > > > - MetricsPersistence is currently inherited by BasePersistence. > > > > > > > > My proposal: > > > > > > > > - Keep the reporting/emitting boundary as the SPI. > > > > - Revise that contract in a framework-agnostic optional Iceberg > > > *metrics > > > > extension API layer*. > > > > - Keep runtime/service responsible for REST ingestion, table/authz > > > > validation, and runtime wiring. > > > > - Keep the battery implementation log-only in the metrics > extension > > > API > > > > layer, *not under runtime/service*. > > > > - Treat durable JDBC metrics as one implementation of the > reporting > > > SPI. > > > > - Decompose the MetricsPersistence and BasePersistence coupling as > > > > durable implementation detail. > > > > > > > > So the difference is not “new reporter concept vs old reporter > > concept.” > > > > but: > > > > > > > > > > > > - PolarisMetricsReporter today = runtime/service CDI-shaped hook. > > > > - Target reporting SPI = same conceptual boundary, but placed in > the > > > > right extension API layer and kept framework-agnostic. > > > > > > > > For the current metrics PR, I think the minimal architectural cleanup > > > > before merge is to *put the reporting SPI contract and the battery > > > default > > > > in the right place*. The durable JDBC implementation, event-backed > > async > > > > implementation, external queue support, and deeper persistence > cleanup > > > can > > > > continue as follow-ups, as long as we do not lock the SPI boundary to > > > > BasePersistence or to runtime/service wiring. > > > > > > > > -ej > > > > > > > > On Wed, Jun 3, 2026 at 3:00 PM Dmitri Bourlatchkov <[email protected] > > > > > > wrote: > > > > > > > > > Hi EJ, > > > > > > > > > > Thanks for the recap / summary! > > > > > > > > > > > Do folks agree that the stable SPI boundary should be > > > > > metrics reporting/emitting, not metrics persistence? > > > > > > > > > > SGTM. > > > > > > > > > > > Does an optional Iceberg metrics extension API layer sound like > the > > > > right > > > > > home for this SPI? > > > > > > > > > > How is that different from current PolarisMetricsReporter? > > > > > > > > > > > Should the current durable metrics work be reframed as a durable > > > > > JDBC reference implementation of that SPI? > > > > > > > > > > SGTM. > > > > > > > > > > > What is the smallest PR sequence to get there without blocking > > > > > the current metrics work unnecessarily? > > > > > > > > > > Let's get https://github.com/apache/polaris/pull/4397 merged > first. > > > > > > > > > > Then, I'd propose isolating the Metrics schema from the MetaStore > > > schema. > > > > > This will probably have some ripple effect into the bootstrap > > > workflows, > > > > so > > > > > it's not a trivial change. > > > > > > > > > > Then, let's reassess. > > > > > > > > > > Cheers, > > > > > Dmitri. > > > > > > > > > > On Wed, Jun 3, 2026 at 5:40 PM EJ Wang < > > [email protected] > > > > > > > > > wrote: > > > > > > > > > > > Hi folks, > > > > > > > > > > > > Thanks again for the discussion today. I updated the sync doc > with > > > > notes > > > > > > from the third metrics architecture sync: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/100h7c4damrUzVuquYbBHM0EvA4LSWuW2IT2dN_7nYVA/edit?tab=t.k96s2xyqr5u1 > > > > > > > > > > > > A few highlights from today’s discussion: > > > > > > > > > > > > - We clarified that Iceberg metrics reporting can be > interpreted > > > > > either > > > > > > as sync or async handling from Polaris’s perspective, so > Polaris > > > > > should > > > > > > stay flexible as a platform instead of baking one handling > model > > > > into > > > > > > the > > > > > > REST API behavior. > > > > > > - We aligned that the built-in (battery) behavior for metrics > > > > emitting > > > > > > can stay simple: no-op/log-only is enough as the default. > > > > > > - Durable metrics persistence should be treated as an > > > implementation > > > > > of > > > > > > the metrics reporting path, not as the core SPI boundary. > > > > > > - The existing durable metrics work can be reviewed as a > > reference > > > > > > implementation of the reporting SPI, with persistence-related > > > logic > > > > > kept > > > > > > self-contained under a metrics durable implementation module > > > rather > > > > > than > > > > > > scattered through core entity persistence. > > > > > > - Dashboard/insights remains a real use case, but we agreed to > > > keep > > > > it > > > > > > separate from the core metrics intake discussion for now. > > > > > > > > > > > > I also did a quick source check after the meeting to make sure we > > are > > > > > > describing the current state accurately. > > > > > > > > > > > > Current state: > > > > > > > > > > > > - Polaris already has a metrics reporting hook: > > > > > PolarisMetricsReporter. > > > > > > - The default implementation is DefaultMetricsReporter, > selected > > > by > > > > > > polaris.iceberg-metrics.reporting.type=default. It is > log-only, > > > > > > effectively quiet unless metrics logging is enabled. > > > > > > - There is also a PersistingMetricsReporter, selected by > > > > > > polaris.iceberg-metrics.reporting.type=persisting, which > > converts > > > > > > Iceberg scan/commit reports into Polaris metrics records and > > > writes > > > > > > through > > > > > > PolarisMetricsManager -> MetricsPersistence. > > > > > > - MetricsPersistence currently exists in the persistence layer > > and > > > > is > > > > > > inherited by BasePersistence. > > > > > > > > > > > > My read from the discussion is that the target boundary should > not > > be > > > > > > MetricsPersistence as inherited by BasePersistence. That path > > should > > > be > > > > > > decomposed as durable implementation detail (taken care of by > > > > > > https://github.com/apache/polaris/pull/4397). The stable > extension > > > > point > > > > > > should instead be the metrics reporting/emitting boundary. > > > > > > > > > > > > Concretely, I think the next proposal should be shaped like this: > > > > > > > > > > > > 1. > > > > > > > > > > > > Define the proper metrics reporting SPI at an optional Iceberg > > > > metrics > > > > > > extension API layer. > > > > > > > > > > > > Example direction: > > > > > > > > > > > > public interface IcebergMetricsReporter { > > > > > > void reportMetric(IcebergMetricsReportContext context, > > > > > > MetricsReport report); > > > > > > } > > > > > > > > > > > > The context would carry the small Polaris-resolved envelope: > > > > > > catalog/table identity, report type, received timestamp, and > > > > > > request/trace > > > > > > context if available. The raw Iceberg MetricsReport remains > the > > > > > payload. > > > > > > 2. > > > > > > > > > > > > Keep runtime/service as the REST ingestion and wiring layer. > > > > > > > > > > > > The REST handler still resolves the table and performs authz > > > before > > > > > > accepting the report. After that, it calls the selected > > reporting > > > > > > implementation. > > > > > > 3. > > > > > > > > > > > > Keep the battery default no-op/log-only. > > > > > > > > > > > > This preserves an out-of-box safe default and avoids > requiring a > > > > > durable > > > > > > metrics store for every Polaris deployment. > > > > > > 4. > > > > > > > > > > > > Move durable JDBC metrics into a self-contained implementation > > of > > > > the > > > > > > reporting SPI. > > > > > > > > > > > > That implementation can own its schema, bootstrap, retention, > > and > > > > read > > > > > > API support. It should not define the core reporting SPI > > boundary, > > > > and > > > > > > it > > > > > > should not require metrics persistence to remain inherited > from > > > > > > BasePersistence. > > > > > > 5. > > > > > > > > > > > > Treat async/event-backed handling as another implementation of > > the > > > > > same > > > > > > reporting SPI. > > > > > > > > > > > > For example, an event-backed reporter could enqueue a metrics > > > event > > > > > and > > > > > > let listeners handle durable storage or other sinks. If we > later > > > > need > > > > > a > > > > > > replaceable queue engine, that seems like a shared > event/metrics > > > > > > substrate > > > > > > topic rather than a metrics-only requirement. > > > > > > > > > > > > This framing lets us keep the REST metrics endpoint simple, > > preserve > > > > the > > > > > > current default behavior, support durable metrics users, and > still > > > > leave > > > > > > room for async/event-backed or external-queue-based > > implementations. > > > > > > > > > > > > I think the main follow-up questions are: > > > > > > > > > > > > - Do folks agree that the stable SPI boundary should be > metrics > > > > > > reporting/emitting, not metrics persistence? > > > > > > - Does an optional Iceberg metrics extension API layer sound > > like > > > > the > > > > > > right home for this SPI? > > > > > > - Should the current durable metrics work be reframed as a > > durable > > > > > JDBC > > > > > > reference implementation of that SPI? > > > > > > - What is the smallest PR sequence to get there without > blocking > > > the > > > > > > current metrics work unnecessarily? > > > > > > > > > > > > Thanks, > > > > > > -ej > > > > > > > > > > > > On Fri, May 15, 2026 at 5:16 PM Dmitri Bourlatchkov < > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > Hi JB, > > > > > > > > > > > > > > Could you set up another meeting, please? Same time on > Wednesday > > as > > > > > last > > > > > > > time... I hope it works for everyone. > > > > > > > > > > > > > > Cheers, > > > > > > > Dmitri. > > > > > > > > > > > > > > On Fri, May 15, 2026 at 8:06 PM Yufei Gu <[email protected] > > > > > > wrote: > > > > > > > > > > > > > > > +1 on another sync call next week. > > > > > > > > > > > > > > > > Yufei > > > > > > > > > > > > > > > > > > > > > > > > On Fri, May 15, 2026 at 4:52 PM Dmitri Bourlatchkov < > > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > WDYT about another sync call next week? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Dmitri. > > > > > > > > > > > > > > > > > > On Wed, May 6, 2026 at 5:29 PM Dmitri Bourlatchkov < > > > > > [email protected] > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi EJ, > > > > > > > > > > > > > > > > > > > > Thanks for the summary! It covers what we discussed in > the > > > > > meeting > > > > > > > very > > > > > > > > > > well, IMHO. > > > > > > > > > > > > > > > > > > > > Looking forward to concrete PRs :) > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > Dmitri. > > > > > > > > > > > > > > > > > > > > On Wed, May 6, 2026 at 5:08 PM EJ Wang < > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > >> Hi folks, > > > > > > > > > >> > > > > > > > > > >> We had a community sync earlier, thanks JB for > scheduling > > > it. > > > > > > Notes > > > > > > > > from > > > > > > > > > >> the first metrics architecture sync (May 6, 10-11am PT). > > > > > > Discussion > > > > > > > > doc > > > > > > > > > >> with per-section status: > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/100h7c4damrUzVuquYbBHM0EvA4LSWuW2IT2dN_7nYVA/edit?tab=t.0 > > > > > > > > > >> > > > > > > > > > >> *The meeting covered both topics from the doc. > > > Direction-level > > > > > > > > alignment > > > > > > > > > >> was reached on the headline pieces; details remain for > PR > > > > review > > > > > > or > > > > > > > > > >> follow-up sessions.* > > > > > > > > > >> > > > > > > > > > >> *Topic 1 — Persistence schema redesign* > > > > > > > > > >> Idea-level alignment on consolidating per-type tables > > > > > > > > > >> (scan_metrics_report, > > > > > > > > > >> commit_metrics_report) into a single metrics_report > table. > > > The > > > > > > > > > motivating > > > > > > > > > >> cost is the surface area added by every new metric type > > > today: > > > > > new > > > > > > > > > table, > > > > > > > > > >> SPI method, record class, model, converter, schema > > > migration. > > > > > > > > > >> > > > > > > > > > >> Most schema details are deferred to the schema PR. A few > > > > > specific > > > > > > > > points > > > > > > > > > >> came up: > > > > > > > > > >> • metric_schema_version: Yufei prefers dropping it, > > since > > > > > there > > > > > > is > > > > > > > > no > > > > > > > > > >> spec-level concept of metrics versioning today and it is > > > hard > > > > to > > > > > > > > define > > > > > > > > > >> unilaterally. Robert prefers keeping it, given IRC v2 is > > > > coming > > > > > > and > > > > > > > > the > > > > > > > > > >> schema should be considered against its likely shape; > > Robert > > > > > also > > > > > > > > raised > > > > > > > > > >> how to differentiate various payload formats if any. > EJ's > > > read > > > > > is > > > > > > > that > > > > > > > > > >> this > > > > > > > > > >> is a two-way-door decision. We can start without the > > field, > > > > and > > > > > if > > > > > > > IRC > > > > > > > > > v2 > > > > > > > > > >> changes the shape we would likely roll a corresponding > new > > > > > schema > > > > > > > > > anyway, > > > > > > > > > >> which is not particularly costly. > > > > > > > > > >> • Payload format: Robert pointed out that future > formats > > > > > beyond > > > > > > > JSON > > > > > > > > > may > > > > > > > > > >> be worth supporting. The exact shape is deferred to the > > > schema > > > > > > > > > discussion. > > > > > > > > > >> • Partition strategy: Anand suggested monthly > > partitioning > > > > > based > > > > > > > on > > > > > > > > > his > > > > > > > > > >> experience as potentially helpful at scale. > > > > > > > > > >> > > > > > > > > > >> *Topic 2 — Where metrics ingestion and storage belong* > > > > > > > > > >> Idea-level alignment that metrics should be a separated > > SPI > > > > from > > > > > > the > > > > > > > > > >> entity > > > > > > > > > >> persistence stack. Two reasons surfaced: (a) workloads > and > > > > > > > capability > > > > > > > > > >> requirements diverge enough that coupling them creates > > > > > artificial > > > > > > > > > >> constraints, and (b) admin experience improves when > > metrics > > > > has > > > > > > its > > > > > > > > own > > > > > > > > > >> bootstrap, retention, and lifecycle. Dmitri noted > Polaris > > > > being > > > > > a > > > > > > > > > platform > > > > > > > > > >> should have the flexibility to support different > > persistence > > > > > > > backends > > > > > > > > > per > > > > > > > > > >> concern, and pointed to a concrete next step of > separating > > > the > > > > > > JDBC > > > > > > > > > >> bootstrap for metrics from the metastore bootstrap. > Robert > > > > > > proposed > > > > > > > an > > > > > > > > > >> additional UX extension: detect an unbootstrapped > metrics > > > > store > > > > > on > > > > > > > > first > > > > > > > > > >> use and auto-bootstrap rather than requiring an explicit > > > > manual > > > > > > > > > bootstrap > > > > > > > > > >> step. > > > > > > > > > >> The meeting also confirmed that Polaris metrics can > start > > > > small > > > > > > and > > > > > > > > stay > > > > > > > > > >> Iceberg-focused. Naming and persistence schema can lean > > > > > > > > > Iceberg-specific. > > > > > > > > > >> If a future expansion to generic-table metrics or > > > operational > > > > > > > metrics > > > > > > > > > >> arrives, an abstraction layer can be built on top of the > > > > Iceberg > > > > > > > > metrics > > > > > > > > > >> reporter at that point. Robert remains on the fence and > > > would > > > > > > prefer > > > > > > > > > >> something more generic but did not block the direction; > > > > Dmitri's > > > > > > > read > > > > > > > > > was > > > > > > > > > >> that the proposed framework already has enough > flexibility > > > to > > > > > > absorb > > > > > > > > > >> future > > > > > > > > > >> expansion. > > > > > > > > > >> > > > > > > > > > >> The Trade-offs and Proposed structure sections in the > doc > > > were > > > > > not > > > > > > > > > >> reviewed > > > > > > > > > >> in detail. They remain open for either the next sync or > PR > > > > > review. > > > > > > > > > >> > > > > > > > > > >> *Cross-cutting alignment — battery-included plus > > pluggable* > > > > > > > > > >> A common philosophy emerged from the discussion. EJ > > > summarized > > > > > it > > > > > > > as: > > > > > > > > > >> Polaris should provide a battery-included UX for > beginners > > > and > > > > > the > > > > > > > > > >> flexibility for advanced users to swap the included > > battery > > > > for > > > > > > > > > something > > > > > > > > > >> more powerful or tailored to their use case. The SPI > > design > > > > > needs > > > > > > to > > > > > > > > > >> enable > > > > > > > > > >> both. > > > > > > > > > >> > > > > > > > > > >> The inputs that shaped this framing: > > > > > > > > > >> • Anand described how his team uses the current > metrics > > > > > > > persistence > > > > > > > > > >> (three metrics consumers in v1.4). > > > > > > > > > >> • Yufei raised Grafana and dashboard integrations as a > > > > > > destination > > > > > > > > use > > > > > > > > > >> case beyond the default. > > > > > > > > > >> • Robert called out that the current design is more > > > > > > JDBC-focused. > > > > > > > > > >> > > > > > > > > > >> Two concrete instances: > > > > > > > > > >> • Async metrics intake: Yufei's initial position was > > that > > > > > async > > > > > > > > should > > > > > > > > > >> largely live on the producer side and there is not much > > > > Polaris > > > > > > can > > > > > > > > do. > > > > > > > > > >> Robert suggested a Polaris-side default is doable via > > > Vert.x. > > > > > > Dmitri > > > > > > > > > >> agreed > > > > > > > > > >> the direction is worth exploring. The meeting converged > > on a > > > > > > > > > >> battery-included default (likely Vert.x-backed) with an > > SPI > > > > > shape > > > > > > > that > > > > > > > > > >> lets > > > > > > > > > >> power users route to a more scalable backend (k8s-hosted > > > > queue, > > > > > > AWS > > > > > > > > SQS, > > > > > > > > > >> etc.). > > > > > > > > > >> • Pluggable destinations: combining Yufei's dashboard > > use > > > > case > > > > > > > with > > > > > > > > > >> Robert's JDBC-focused call-out, the meeting agreed the > SPI > > > > > should > > > > > > be > > > > > > > > > >> structured for multiple sinks so integrations become > impl > > > > > choices > > > > > > > > rather > > > > > > > > > >> than architectural changes. > > > > > > > > > >> > > > > > > > > > >> The battery-included default is most likely to use the > > > > existing > > > > > > > > > >> JDBC-backed > > > > > > > > > >> approach. > > > > > > > > > >> > > > > > > > > > >> *Direction (idea-level alignment)* > > > > > > > > > >> • Single metrics_report table consolidating per-type > > > > metrics, > > > > > > > > > replacing > > > > > > > > > >> scan_metrics_report and commit_metrics_report > > > > > > > > > >> • Iceberg-focused naming and schema for now, revisit > if > > > > > > > > generic-table > > > > > > > > > or > > > > > > > > > >> operational metrics arrive > > > > > > > > > >> • Metrics persistence as a separated SPI, not on > > > > > BasePersistence > > > > > > > > > >> • Bootstrap path separated for metrics, independent of > > > > > metastore > > > > > > > > > >> bootstrap > > > > > > > > > >> • "Battery-included plus pluggable" as the SPI design > > > > > philosophy > > > > > > > > > >> > > > > > > > > > >> *Open items* > > > > > > > > > >> • Schema details: metric_schema_version, payload > format, > > > IRC > > > > > v2 > > > > > > > > > >> forward-compat shape > > > > > > > > > >> • SPI design details — full review either in the next > > sync > > > > or > > > > > in > > > > > > > the > > > > > > > > > >> corresponding PR > > > > > > > > > >> • Schema refactor PR ownership > > > > > > > > > >> > > > > > > > > > >> *Action items* > > > > > > > > > >> • EJ to take a first stab at the SPI design and > > > potentially > > > > > > > partner > > > > > > > > > with > > > > > > > > > >> Anand to incorporate the lessons learned from the > existing > > > > > > reporter > > > > > > > > and > > > > > > > > > >> persistence work. > > > > > > > > > >> • Schema refactor PR ownership is not yet decided. If > > > anyone > > > > > is > > > > > > > > > >> interested in driving it, reply on this thread. > > > > > > > > > >> • JB to schedule the next sync, tentatively in two > > weeks. > > > > > > > > > >> > > > > > > > > > >> -ej > > > > > > > > > >> > > > > > > > > > >> On Mon, Apr 27, 2026 at 3:07 PM EJ Wang < > > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > >> wrote: > > > > > > > > > >> > > > > > > > > > >> > Thanks Yufei for the +1. > > > > > > > > > >> > > > > > > > > > > >> > JB, could you help add a biweekly metrics architecture > > > sync > > > > to > > > > > > the > > > > > > > > > >> Polaris > > > > > > > > > >> > community calendar? I'm thinking Thursdays at 9-10am > PT, > > > on > > > > > the > > > > > > > > > >> off-weeks > > > > > > > > > >> > from the community meeting (starting May 7), 60 > minutes. > > > > > > > > > >> > > > > > > > > > > >> > Here's a rough agenda to work through over the first > few > > > > > > sessions, > > > > > > > > > >> grouped > > > > > > > > > >> > by priority: > > > > > > > > > >> > > > > > > > > > > >> > *First: foundational direction* > > > > > > > > > >> > > > > > > > > > > >> > 1. MetricsPersistence: public SPI or internal > > > > implementation > > > > > > > > detail? > > > > > > > > > >> > • Marked @Beta, javadoc calls it a "Service > > Provider > > > > > > > > Interface", > > > > > > > > > >> but > > > > > > > > > >> > only one consumer (JdbcBasePersistenceImpl), lives on > > > > > > > > BasePersistence. > > > > > > > > > >> If > > > > > > > > > >> > demoted to a private helper inside a persisting > reporter > > > > impl, > > > > > > > most > > > > > > > > > >> > downstream design decisions become implementation > > details > > > > > rather > > > > > > > > than > > > > > > > > > >> > contract questions. > > > > > > > > > >> > > > > > > > > > > >> > 2. Persistence schema redesign > > > > > > > > > >> > • Current two-table layout (scan_metrics_report, > > > > > > > > > >> > commit_metrics_report) with ~25 flattened columns > each. > > > > Every > > > > > > new > > > > > > > > > metric > > > > > > > > > >> > type requires a new table, SPI method, record class, > > > model, > > > > > > > > converter, > > > > > > > > > >> and > > > > > > > > > >> > schema migration. Direction to explore: single table > > with > > > > > > > > metric_type > > > > > > > > > >> enum, > > > > > > > > > >> > schema_version, and JSON payload column. > > > > > > > > > >> > > > > > > > > > > >> > *Second: design details once direction is set* > > > > > > > > > >> > > > > > > > > > > >> > 3. Partition key strategy > > > > > > > > > >> > • Single-table design means scan metrics at scale > > > will > > > > > have > > > > > > > > high > > > > > > > > > >> > write concurrency per table. Schema needs to expose > > enough > > > > > > > structure > > > > > > > > > for > > > > > > > > > >> > backends to shard by entity or time range. > > > > > > > > > >> > > > > > > > > > > >> > 4. Read/write path consistency > > > > > > > > > >> > • Writes go through PolarisMetricsManager on > > > > > > > MetaStoreManager. > > > > > > > > > >> Reads > > > > > > > > > >> > bypass MetaStoreManager and go straight to > > > BasePersistence, > > > > > > > > excluding > > > > > > > > > >> > non-JDBC backends from the read API. > > > > > > > > > >> > > > > > > > > > > >> > *Third: cleanup and alignment* > > > > > > > > > >> > > > > > > > > > > >> > 5. PolarisMetricsReporter naming > > > > > > > > > >> > • Only handles IRC (ScanReport/CommitReport), > > doesn't > > > > > cover > > > > > > > > > generic > > > > > > > > > >> > tables or operational metrics. Name is broader than > > scope. > > > > > > > > > >> > > > > > > > > > > >> > 6. PolarisMetricsManager facade passthrough > > > > > > > > > >> > • Entire default method is > > > > > > > > > >> callCtx.getMetaStore().writeScanReport(). > > > > > > > > > >> > Zero logic, passes Level 1 straight through to Level > 3. > > > Same > > > > > > > > > >> anti-pattern > > > > > > > > > >> > as PolarisEventManager. > > > > > > > > > >> > > > > > > > > > > >> > 7. Iceberg community alignment > > > > > > > > > >> > • Payload-type extension needs discussion on > > > > dev@iceberg. > > > > > > > > > >> obelix74's > > > > > > > > > >> > Feb thread got zero replies. Needs a committer voice. > > > > > > > > > >> > > > > > > > > > > >> > Lets confirm prioritization in the first session. > > > > > > > > > >> > > > > > > > > > > >> > -ej > > > > > > > > > >> > > > > > > > > > > >> > On Tue, Apr 21, 2026 at 3:18 PM Yufei Gu < > > > > > [email protected]> > > > > > > > > > wrote: > > > > > > > > > >> > > > > > > > > > > >> >> Thanks everyone for continuing to drive this > forward. I > > > > agree > > > > > > > that > > > > > > > > > the > > > > > > > > > >> >> problem is getting complex enough that a more > > structured > > > > > > > discussion > > > > > > > > > >> would > > > > > > > > > >> >> help. > > > > > > > > > >> >> > > > > > > > > > >> >> +1 on setting up a biweekly sync for the metrics > > > > > architecture. > > > > > > > I’m > > > > > > > > > >> happy > > > > > > > > > >> >> to > > > > > > > > > >> >> join. > > > > > > > > > >> >> > > > > > > > > > >> >> Yufei > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> On Tue, Apr 21, 2026 at 2:34 PM EJ Wang < > > > > > > > > > >> [email protected]> > > > > > > > > > >> >> wrote: > > > > > > > > > >> >> > > > > > > > > > >> >> > Also, I've been looking more closely at the > > > *persistence > > > > > > schema > > > > > > > > in > > > > > > > > > >> the > > > > > > > > > >> >> > current metrics work*, and I think there's a > > structural > > > > > > > rigidity > > > > > > > > > >> problem > > > > > > > > > >> >> > worth raising before the shape gets locked in. > > > > > > > > > >> >> > > > > > > > > > > >> >> > Right now we have two separate tables > > > > (scan_metrics_report > > > > > > and > > > > > > > > > >> >> > commit_metrics_report), each with ~25 flattened > > columns > > > > > that > > > > > > > > > directly > > > > > > > > > >> >> > mirror the Iceberg report fields. The SPI follows > the > > > > same > > > > > > > split: > > > > > > > > > >> >> > writeScanReport and writeCommitReport as separate > > > > methods, > > > > > > with > > > > > > > > > >> per-type > > > > > > > > > >> >> > record classes, converters, and model objects. *The > > > > > practical > > > > > > > > cost: > > > > > > > > > >> >> > adding a new metric type (operational metrics, for > > > > example) > > > > > > > > > requires > > > > > > > > > >> a > > > > > > > > > >> >> new > > > > > > > > > >> >> > table, a new SPI method, a new record class, a new > > > model > > > > > > > class, a > > > > > > > > > new > > > > > > > > > >> >> > converter branch, and a schema migration*. That's a > > lot > > > > of > > > > > > > > surface > > > > > > > > > >> area > > > > > > > > > >> >> > for what should be "one more kind of metric." > > > > > > > > > >> >> > > > > > > > > > > >> >> > *My bias* would be toward a single metrics table > with > > > *a > > > > > > typed > > > > > > > > JSON > > > > > > > > > >> >> > payload*. Something like: metric_type (enum), > > > entity_id, > > > > > > > > > >> >> > table_identifier, snapshot_id (nullable), > > received_ts, > > > > > > > > > >> schema_version, > > > > > > > > > >> >> and > > > > > > > > > >> >> > a payload column for the metric-specific data. The > > > > > > metric_type > > > > > > > + > > > > > > > > > >> >> > schema_version pair gives us a forward-compatible > > > > contract > > > > > > for > > > > > > > > the > > > > > > > > > >> >> payload > > > > > > > > > >> >> > shape. Adding a new metric type becomes an enum > value > > > > and a > > > > > > > > payload > > > > > > > > > >> >> schema, > > > > > > > > > >> >> > not a schema migration. One thing I think we need > to > > be > > > > > > > > deliberate > > > > > > > > > >> >> about is > > > > > > > > > >> >> > the partition key design. If all metric types land > in > > > one > > > > > > > table, > > > > > > > > > scan > > > > > > > > > >> >> > metrics at scale (high concurrency, high frequency > > > across > > > > > > many > > > > > > > > > >> tables) > > > > > > > > > >> >> > could easily create hot partitions. We'd want the > > > > > persistence > > > > > > > > layer > > > > > > > > > >> to > > > > > > > > > >> >> be > > > > > > > > > >> >> > able to shard by entity or time range, and that > means > > > the > > > > > > > logical > > > > > > > > > >> schema > > > > > > > > > >> >> > needs to expose enough structure for backends to > > > > partition > > > > > > on. > > > > > > > I > > > > > > > > > >> don't > > > > > > > > > >> >> > think the current flattened layout gives us that. > > > > > > > > > >> >> > > > > > > > > > > >> >> > This is getting complex enough that I don't think > > > ad-hoc > > > > > > PR/ML > > > > > > > > > >> threads > > > > > > > > > >> >> > will converge well. *Would people be open to a > > biweekly > > > > > sync > > > > > > > for > > > > > > > > > >> metrics > > > > > > > > > >> >> > architecture?* I think 30 minutes every two weeks > > with > > > > > > > interested > > > > > > > > > >> >> parties > > > > > > > > > >> >> > would be enough to work through the schema, SPI > > shape, > > > > and > > > > > > read > > > > > > > > API > > > > > > > > > >> >> design > > > > > > > > > >> >> > together. Happy to help set that up. > > > > > > > > > >> >> > > > > > > > > > > >> >> > -ej > > > > > > > > > >> >> > > > > > > > > > > >> >> > On Mon, Apr 20, 2026 at 2:19 PM EJ Wang < > > > > > > > > > >> [email protected] > > > > > > > > > >> >> > > > > > > > > > > >> >> > wrote: > > > > > > > > > >> >> > > > > > > > > > > >> >> >> Reviewed #4115, left a comment on the code > > > organization > > > > > > side. > > > > > > > > > >> >> >> > > > > > > > > > >> >> >> One thing stood out: the metrics write path enters > > > > through > > > > > > > > > >> >> >> PolarisMetricsManager on MetaStoreManager, but the > > new > > > > > read > > > > > > > path > > > > > > > > > >> >> bypasses > > > > > > > > > >> >> >> MetaStoreManager entirely and goes straight to > > > > > > BasePersistence > > > > > > > > via > > > > > > > > > >> >> >> callContext.getMetaStore(). That means the read > API > > > only > > > > > > works > > > > > > > > for > > > > > > > > > >> >> backends > > > > > > > > > >> >> >> that implement BasePersistence. NoSQL and remote > > > > backends > > > > > > > can't > > > > > > > > > >> >> participate. > > > > > > > > > >> >> >> > > > > > > > > > >> >> >> Stepping back, I think the metrics subsystem is > > > growing > > > > > into > > > > > > > > > >> something > > > > > > > > > >> >> >> real (write + read + REST API + AuthZ + > pagination) > > > *but > > > > > the > > > > > > > > > >> >> persistence > > > > > > > > > >> >> >> side is split across two layers in a way that's > hard > > > to > > > > > > > > extend*. I > > > > > > > > > >> put > > > > > > > > > >> >> >> together two diagrams to show what I mean (my best > > > > > effort). > > > > > > > > > >> >> >> > > > > > > > > > >> >> >> *Current state* (Diagram 1): three interfaces at > > three > > > > > > > different > > > > > > > > > >> >> levels. > > > > > > > > > >> >> >> The engine-facing SPI (PolarisMetricsReporter) is > > > clean. > > > > > But > > > > > > > > > >> >> >> PolarisMetricsManager on MetaStoreManager is a > > > > passthrough > > > > > > to > > > > > > > > > >> >> >> MetricsPersistence on BasePersistence. The @Beta > > > > > annotation > > > > > > > and > > > > > > > > > SPI > > > > > > > > > >> >> javadoc > > > > > > > > > >> >> >> are on the BasePersistence layer, while the actual > > > > > extension > > > > > > > > > points > > > > > > > > > >> >> >> (PolarisMetricsReporter, PolarisMetricsManager) > > carry > > > no > > > > > > > > stability > > > > > > > > > >> >> >> annotation. The write path goes through the > > > > > MetaStoreManager > > > > > > > > > layer, > > > > > > > > > >> the > > > > > > > > > >> >> >> read path doesn't. > > > > > > > > > >> >> >> > > > > > > > > > >> >> >> *What I envision* (Diagram 2): two SPIs at two > > levels. > > > > > > > > > >> >> >> PolarisMetricsReporter stays as the engine-facing > > SPI. > > > > > > > > > >> >> >> PolarisMetricsManager becomes the backend-facing > SPI > > > > with > > > > > > both > > > > > > > > > write > > > > > > > > > >> >> and > > > > > > > > > >> >> >> read methods at the MetaStoreManager level, where > > any > > > > > > backend > > > > > > > > > (JDBC, > > > > > > > > > >> >> NoSQL, > > > > > > > > > >> >> >> remote) can implement them. MetricsPersistence on > > > > > > > > BasePersistence > > > > > > > > > >> goes > > > > > > > > > >> >> >> away. Where metrics actually land is an > > implementation > > > > > > detail, > > > > > > > > > not a > > > > > > > > > >> >> core > > > > > > > > > >> >> >> interface. > > > > > > > > > >> >> >> > > > > > > > > > >> >> >> *Minor naming thing*: PolarisMetricsReporter is > > > broader > > > > > than > > > > > > > > what > > > > > > > > > it > > > > > > > > > >> >> >> actually handles. It only accepts Iceberg REST > > Catalog > > > > > > metrics > > > > > > > > > >> >> (ScanReport, > > > > > > > > > >> >> >> CommitReport via MetricsReport). Generic table > > metrics > > > > or > > > > > > > > > >> operational > > > > > > > > > >> >> >> metrics aren't in scope. Not blocking, but worth > > > noting > > > > if > > > > > > the > > > > > > > > > >> metrics > > > > > > > > > >> >> >> surface expands. > > > > > > > > > >> >> >> > > > > > > > > > >> >> >> *Rough sketch of how to get there*: > > > > > > > > > >> >> >> 1. Add read methods to PolarisMetricsManager > > > > > > > (listScanReports, > > > > > > > > > >> >> >> listCommitReports) with default no-op, same as the > > > > > existing > > > > > > > > write > > > > > > > > > >> >> methods. > > > > > > > > > >> >> >> (Probably make PolarisMetricsManager more explicit > > on > > > > > being > > > > > > > > > Iceberg > > > > > > > > > >> >> >> specific like package name or class name etc.) > > > > > > > > > >> >> >> 2. Wire MetricsReportsService through > > > MetaStoreManager > > > > > > > instead > > > > > > > > > of > > > > > > > > > >> >> >> callContext.getMetaStore(). > > > > > > > > > >> >> >> 3. Extract metrics persistence from > > > > > > JdbcBasePersistenceImpl > > > > > > > > into > > > > > > > > > >> its > > > > > > > > > >> >> >> own class. That file carries ~7 responsibilities, > > > > metrics > > > > > > > being > > > > > > > > > one > > > > > > > > > >> of > > > > > > > > > >> >> them. > > > > > > > > > >> >> >> 4. Remove MetricsPersistence from > BasePersistence. > > > > > > > > > >> >> >> > > > > > > > > > >> >> >> *None of this needs to happen in #4115. But if the > > > > > direction > > > > > > > > makes > > > > > > > > > >> >> sense, > > > > > > > > > >> >> >> it would be good to align before the metrics > surface > > > > grows > > > > > > > > > further. > > > > > > > > > >> >> Curious > > > > > > > > > >> >> >> what others think.* > > > > > > > > > >> >> >> > > > > > > > > > >> >> >> *My mental model note*: Level 1 MetaStoreManager; > > > level > > > > 2 > > > > > > > > > >> transactional > > > > > > > > > >> >> >> persistence; level 3 base persistence > > > > > > > > > >> >> >> > > > > > > > > > >> >> >> Diagram 1 > > > > > > > > > >> >> >> < > > > > > > > > > >> >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://www.plantuml.com/plantuml/uml/bLHDR-Cs4BthLmpIYupw0zbkKQ1r3M-S7Bp8xhhM7WCOb3IM65EaGD9EX2RzxHrHb4CxRelwa4YSDu_lpOVcnZ9jzvM8BBS2uGjQpJC3dtHMSekPtMk44IpsMgEqa5XcCOhCZikQQLP1pR8TAp2n3ILhmZDP20m0fcIvUkAoW2qJXd9z1bpToO9BX3WXu0ucy5rpgGPNm0nW5_epUWtm2Ue3pn3kMOFQmKntGZW0BYtgBSi8k5A2QMwybJNMIbFiGSR9QZc4nUqIvikStF0jHprua5C-amge42aNt3R0f5JaaoivdV2Pkqbx4hee4ymOkBh5BTiB-_uIeGeo8zL8rPsPl4DktdEiK1jkB1NdZCRbrSTecDe_mlHbF0wvBmCkaOH5_S8a_TTTKI6-nmCAkEw4LpxsZ-LbYLKQFKMNOgf_wuM7_bV9gOer5SYMMksBSWXFcbi49KNZXNLicwfe3TETC7gPdPqI7uBcHMb1RSzYq34c6PDUM9mn8HRsUTZEiDBve3NjVZumBj0U7SS37mGO7vcwtiK-_pU7U7L_f-digo9YbhSwIfMRwIITKGXbxdIUTCGF1SeCJxloKsU-3k9ddRbX1eDq1q_fx1JbBGT0glVyXimDuP4TQ5qpCAmnGEj2s_6n5mtn1z-97-63itFQZLPO1Ev2tu_WF7Ju-VPc0Skg5bYXxBhkY1xpD7EM_7fyflSpIsqMgVth5xhVr4eQxWQ8enaSAJQSG16yFSDuJ798rrcXr_3n-lfdk7icQjEBmFujL7AodiP_Y4Z7-YxvtZNs4zMgpNTl6tF8sglyPsmqchrjvQ-m-aP94r-TwCA2Ka8upPJZwtvSpoYCXkYMZU2NXvRMBfq9P3i3Le4VAZUAlUZ_oPKsxPgY0Q_BSKLkyr9bhQhQrJjo_x3TPlIB0DPjnMfcIoYP0QaYw1a0fTKDr8fB6ntNuvmoL1ZGkXa69Njh43zf9GiGxHQrA_jDYWRSzF5--WmTVrN97_Sm8LbLUy_lGBmLanJjFkDlGkRqjA_4tm00 > > > > > > > > > >> >> > > > > > > > > > > >> >> >> : > > > > > > > > > >> >> >> > > > > > > > > > >> >> >> [image: image.png] > > > > > > > > > >> >> >> > > > > > > > > > >> >> >> Diagram 2 > > > > > > > > > >> >> >> < > > > > > > > > > >> >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://www.plantuml.com/plantuml/uml/VLLDR-8m4BtdLupO2sWBLVU8AaGB7AXAbssGzb896SS9RXqxjKqBwkv_tt7iV43fdaZYDpFlpRmnOsE9jhjSH9PRmM31hERKm8scMsuPjJlDe0yheZDc8RR4iYWoBrmMH9CS2a9VICPYUy1OZN0YCy5Q0BCbYNhdCeEK28En8G8wCvbnoQ0R8_05Bc6bkLIz3X03p1zzH7zR-9ZfDquPt9C3qoNCX2yV4G2NbkcKu5jdgGJHt0GbZwnG6i-UP3TUpk5gM6Ldqke350eZUqzoCft3U9xWHvxoa5-7K4nF1J46EbEMafsmdrCBbQ44gVggy18IZrn_ph5asd1ZiIKdQSgueZvjXrQFSFrdC3YN-nXmBacxbGiYyLVxLaBtdhqn0LSzdBDhqQtQoOJeGyad3z0lUqnYgpGB6Ns8oVyta00Dy_WnX0tIOZ8v6SYxHll1TrH6aejAik-mh-AphVFCwSUQqFypElag5QRGFDjQKEd96K1P8QP41c9TzA_IIQyvdAWyv_RSiS3skb0_EzDDkK2v5xWF6MiGFlvhpFLcD2Dq2pml14gaF67eQkmd8gulDoC4kSOu6KVpkvlUJg1RTbWISU40RdBUUS_9XfRZ2dwxm_SW8LYFISgm_MnlDQ6M9P1gbKEc4X-2pH_FvJCkCqm9pbVjD6LrwdLeOrDWfOaqc8Wh9BE85oNKxkNQ6o4yGRy_Eae0G_G8tZv81d3bHDB23WOdisohVr3nh_j6lbSjbNaLRTc8UgtPbAU1J_tygOfZX9DWEJeHDvYx-qmSi5FgNLPZwHrHcUsncGQ5-skhUclpE5fo4ounpFauYrUbkU6ccfnxMvitwag4IyerhTxj8In_Oj1bDO4pQru674loYrGlULHLEGCjwJJ8gDoVZR8MxO4BT3IzRvIcAQKezC6xpziGnTyImrfEGyJI_OcKfgtxIvnTqFEMS17L9Z-jsARN5FmTheP7HtSdtOMT0B4GY2FYHXxgQmMtj2bRqiLFGapiVe1_QVKDrkqXcm83aFEXnMYCZ-xlyHy > > > > > > > > > >> >> > > > > > > > > > > >> >> >> : > > > > > > > > > >> >> >> [image: image.png] > > > > > > > > > >> >> >> > > > > > > > > > >> >> >> -ej > > > > > > > > > >> >> >> > > > > > > > > > >> >> >> On Wed, Apr 15, 2026 at 8:22 AM Dmitri > Bourlatchkov > > < > > > > > > > > > >> [email protected]> > > > > > > > > > >> >> >> wrote: > > > > > > > > > >> >> >> > > > > > > > > > >> >> >>> Hi All, > > > > > > > > > >> >> >>> > > > > > > > > > >> >> >>> Heads up: The current state of PR [4115] looks > > pretty > > > > > solid > > > > > > > to > > > > > > > > > me. > > > > > > > > > >> I > > > > > > > > > >> >> >>> believe this PR is approaching a mergeable > > condition. > > > > > > > > > >> >> >>> > > > > > > > > > >> >> >>> Please post your reviews if you have any > comments. > > > > > > > > > >> >> >>> > > > > > > > > > >> >> >>> [4115] > https://github.com/apache/polaris/pull/4115 > > > > > > > > > >> >> >>> > > > > > > > > > >> >> >>> Thanks, > > > > > > > > > >> >> >>> Dmitri. > > > > > > > > > >> >> >>> > > > > > > > > > >> >> >>> On Tue, Mar 3, 2026 at 3:29 PM Anand Kumar > Sankaran > > > via > > > > > > dev < > > > > > > > > > >> >> >>> [email protected]> wrote: > > > > > > > > > >> >> >>> > > > > > > > > > >> >> >>> > Hi Yufei and Dmitri, > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > Here is a proposal for the REST endpoints for > > > metrics > > > > > and > > > > > > > > > events. > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > > > > https://github.com/apache/polaris/pull/3924/changes > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > I did not see any precursors for raising a PR > for > > > > > > > proposals, > > > > > > > > so > > > > > > > > > >> >> trying > > > > > > > > > >> >> >>> > this. Please let me know what you think. > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > - > > > > > > > > > >> >> >>> > Anand > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > From: Anand Kumar Sankaran < > > > > [email protected] > > > > > > > > > > > > > > > >> >> >>> > Date: Monday, March 2, 2026 at 10:25 AM > > > > > > > > > >> >> >>> > To: [email protected] < > > [email protected] > > > > > > > > > > > > > >> >> >>> > Subject: Re: Polaris Telemetry and Audit Trail > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > About the REST API, based on my use cases: > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > 1. > > > > > > > > > >> >> >>> > I want to be able to query commit metrics to > > track > > > > > files > > > > > > > > added > > > > > > > > > / > > > > > > > > > >> >> >>> removed > > > > > > > > > >> >> >>> > per commit, along with record counts. The > > ingestion > > > > > > > pipeline > > > > > > > > > that > > > > > > > > > >> >> >>> writes > > > > > > > > > >> >> >>> > this data is owned by us and we are guaranteed > to > > > > write > > > > > > > this > > > > > > > > > >> >> >>> information > > > > > > > > > >> >> >>> > for each write. > > > > > > > > > >> >> >>> > 2. > > > > > > > > > >> >> >>> > I want to be able to query scan metrics for > > read. I > > > > > > > > understand > > > > > > > > > >> >> clients > > > > > > > > > >> >> >>> do > > > > > > > > > >> >> >>> > not fulfill this requirement. > > > > > > > > > >> >> >>> > 3. > > > > > > > > > >> >> >>> > I want to be able to query the events table > > (events > > > > are > > > > > > > > > >> persisted) - > > > > > > > > > >> >> >>> this > > > > > > > > > >> >> >>> > may supersede #2, I am not sure yet. > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > All this information is in the JDBC based > > > persistence > > > > > > model > > > > > > > > and > > > > > > > > > >> is > > > > > > > > > >> >> >>> > persisted in the metastore. I currently don’t > > have > > > a > > > > > need > > > > > > > to > > > > > > > > > >> query > > > > > > > > > >> >> >>> > prometheus or open telemetry. I do publish some > > > > events > > > > > to > > > > > > > > > >> Prometheus > > > > > > > > > >> >> >>> and > > > > > > > > > >> >> >>> > they are forwarded to our dashboards elsewhere. > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > About the CLI utilities, I meant the admin user > > > > > > utilities. > > > > > > > In > > > > > > > > > >> one of > > > > > > > > > >> >> >>> the > > > > > > > > > >> >> >>> > earliest drafts of my proposal, Prashant > > mentioned > > > > that > > > > > > the > > > > > > > > > >> metrics > > > > > > > > > >> >> >>> tables > > > > > > > > > >> >> >>> > can grow indefinitely and that a similar > problem > > > > exists > > > > > > > with > > > > > > > > > the > > > > > > > > > >> >> events > > > > > > > > > >> >> >>> > table as well. We discussed that cleaning up of > > old > > > > > > records > > > > > > > > > from > > > > > > > > > >> >> both > > > > > > > > > >> >> >>> > metrics tables and events tables can be done > via > > a > > > > CLI > > > > > > > > utility. > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > I see that Yufei has covered the discussion > about > > > > > > > > datasources. > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > - > > > > > > > > > >> >> >>> > Anand > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > From: Yufei Gu <[email protected]> > > > > > > > > > >> >> >>> > Date: Friday, February 27, 2026 at 9:54 PM > > > > > > > > > >> >> >>> > To: [email protected] < > > [email protected] > > > > > > > > > > > > > >> >> >>> > Subject: Re: Polaris Telemetry and Audit Trail > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > This Message Is From an External Sender > > > > > > > > > >> >> >>> > This message came from outside your > organization. > > > > > > > > > >> >> >>> > Report Suspicious< > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > > > > > > > > > >> >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://us-phishalarm-ewt.proofpoint.com/EWT/v1/Iz9xO38YGHZK!YhNDZABkHi1B699ote2uMwpOZw8i0QMCGO2Szc-HshuABGhGvwPJcymE6G2oUUxtS8xDkSrtGTPm_I3QnVDHoLMk50m9v8z_nZKTkd-bnVUbreF1u0WnfV_X5eYevZl_$ > > > > > > > > > >> >> >>> > > > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > As I mentioned in > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > > > > > > > > > >> >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://github.com/apache/polaris/issues/3890__;!!Iz9xO38YGHZK!5EuyFFkk3vhRWVIRvQAWBSQfpJkTMA9HxugzDwXmN0LPPqhEFxYkFRGVhtb8AqUwXtDh2OplcMnbMDHKOxrvDU0$ > > > > > > > > > >> >> >>> , > > > > > > > > > >> >> >>> > supporting > > > > > > > > > >> >> >>> > multiple data sources is not a trivial change. > I > > > > would > > > > > > > > strongly > > > > > > > > > >> >> >>> recommend > > > > > > > > > >> >> >>> > starting with a design document to carefully > > > evaluate > > > > > the > > > > > > > > > >> >> architectural > > > > > > > > > >> >> >>> > implications and long term impact. > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > A REST endpoint to query metrics seems > reasonable > > > > given > > > > > > the > > > > > > > > > >> current > > > > > > > > > >> >> >>> JDBC > > > > > > > > > >> >> >>> > based persistence model. That said, we may also > > > > > consider > > > > > > > > > >> alternative > > > > > > > > > >> >> >>> > storage models. For example, if we later adopt > a > > > time > > > > > > > series > > > > > > > > > >> system > > > > > > > > > >> >> >>> such as > > > > > > > > > >> >> >>> > Prometheus to store metrics, the query model > and > > > > access > > > > > > > > > patterns > > > > > > > > > >> >> would > > > > > > > > > >> >> >>> be > > > > > > > > > >> >> >>> > fundamentally different. Designing the REST API > > > > without > > > > > > > > > >> considering > > > > > > > > > >> >> >>> these > > > > > > > > > >> >> >>> > potential evolutions may limit flexibility. I'd > > > > suggest > > > > > > to > > > > > > > > > start > > > > > > > > > >> >> with > > > > > > > > > >> >> >>> the > > > > > > > > > >> >> >>> > use case. > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > Yufei > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > On Fri, Feb 27, 2026 at 3:42 PM Dmitri > > > Bourlatchkov < > > > > > > > > > >> >> [email protected]> > > > > > > > > > >> >> >>> > wrote: > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > > Hi Anand, > > > > > > > > > >> >> >>> > > > > > > > > > > > >> >> >>> > > Sharing my view... subject to discussion: > > > > > > > > > >> >> >>> > > > > > > > > > > > >> >> >>> > > 1. Adding non-IRC REST API to Polaris is > > > perfectly > > > > > > fine. > > > > > > > > > >> >> >>> > > > > > > > > > > > >> >> >>> > > Figuring out specific endpoint URIs and > > payloads > > > > > might > > > > > > > > > require > > > > > > > > > >> a > > > > > > > > > >> >> few > > > > > > > > > >> >> >>> > > roundtrips, so opening a separate thread for > > that > > > > > might > > > > > > > be > > > > > > > > > >> best. > > > > > > > > > >> >> >>> > > Contributors commonly create Google Docs for > > new > > > > API > > > > > > > > > proposals > > > > > > > > > >> too > > > > > > > > > >> >> >>> (they > > > > > > > > > >> >> >>> > > fairly easy to update as the email discussion > > > > > > > progresses). > > > > > > > > > >> >> >>> > > > > > > > > > > > >> >> >>> > > There was a suggestion to try Markdown (with > > PRs) > > > > for > > > > > > > > > proposals > > > > > > > > > >> >> [1] > > > > > > > > > >> >> >>> ... > > > > > > > > > >> >> >>> > > feel free to give it a try if you are > > comfortable > > > > > with > > > > > > > > that. > > > > > > > > > >> >> >>> > > > > > > > > > > > >> >> >>> > > 2. Could you clarify whether you mean end > user > > > > > > utilities > > > > > > > or > > > > > > > > > >> admin > > > > > > > > > >> >> >>> user > > > > > > > > > >> >> >>> > > utilities? In the latter case those might be > > more > > > > > > > suitable > > > > > > > > > for > > > > > > > > > >> the > > > > > > > > > >> >> >>> Admin > > > > > > > > > >> >> >>> > > CLI (java) not the Python CLI, IMHO. > > > > > > > > > >> >> >>> > > > > > > > > > > > >> >> >>> > > Why would these utilities be common with > > events? > > > > > IMHO, > > > > > > > > event > > > > > > > > > >> use > > > > > > > > > >> >> >>> cases > > > > > > > > > >> >> >>> > are > > > > > > > > > >> >> >>> > > distinct from scan/commit metrics. > > > > > > > > > >> >> >>> > > > > > > > > > > > >> >> >>> > > 3. I'd prefer separating metrics persistence > > from > > > > > > > MetaStore > > > > > > > > > >> >> >>> persistence > > > > > > > > > >> >> >>> > at > > > > > > > > > >> >> >>> > > the code level, so that they could be mixed > and > > > > > matched > > > > > > > > > >> >> >>> independently. > > > > > > > > > >> >> >>> > The > > > > > > > > > >> >> >>> > > separate datasource question will become a > > > > non-issue > > > > > > with > > > > > > > > > that > > > > > > > > > >> >> >>> approach, > > > > > > > > > >> >> >>> > I > > > > > > > > > >> >> >>> > > guess. > > > > > > > > > >> >> >>> > > > > > > > > > > > >> >> >>> > > The rationale for separating scan metrics and > > > > > metastore > > > > > > > > > >> >> persistence > > > > > > > > > >> >> >>> is > > > > > > > > > >> >> >>> > that > > > > > > > > > >> >> >>> > > "cascading deletes" between them are hardly > > ever > > > > > > > required. > > > > > > > > > >> >> >>> Furthermore, > > > > > > > > > >> >> >>> > the > > > > > > > > > >> >> >>> > > data and query patterns are very different so > > > > > different > > > > > > > > > >> >> technologies > > > > > > > > > >> >> >>> > might > > > > > > > > > >> >> >>> > > be beneficial in each case. > > > > > > > > > >> >> >>> > > > > > > > > > > > >> >> >>> > > [1] > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > > > > > > > > > >> >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://lists.apache.org/thread/yto2wp982t43h1mqjwnslswhws5z47cy__;!!Iz9xO38YGHZK!5EuyFFkk3vhRWVIRvQAWBSQfpJkTMA9HxugzDwXmN0LPPqhEFxYkFRGVhtb8AqUwXtDh2OplcMnbMDHKxYDakNU$ > > > > > > > > > >> >> >>> > > > > > > > > > > > >> >> >>> > > Cheers, > > > > > > > > > >> >> >>> > > Dmitri. > > > > > > > > > >> >> >>> > > > > > > > > > > > >> >> >>> > > On Fri, Feb 27, 2026 at 6:19 PM Anand Kumar > > > > Sankaran > > > > > > via > > > > > > > > dev > > > > > > > > > < > > > > > > > > > >> >> >>> > > [email protected]> wrote: > > > > > > > > > >> >> >>> > > > > > > > > > > > >> >> >>> > > > Thanks all. This PR is merged now. > > > > > > > > > >> >> >>> > > > > > > > > > > > > >> >> >>> > > > Here are the follow-up features / work > > needed. > > > > > These > > > > > > > > were > > > > > > > > > >> all > > > > > > > > > >> >> >>> part of > > > > > > > > > >> >> >>> > > the > > > > > > > > > >> >> >>> > > > merged PR at some point in time and were > > > removed > > > > to > > > > > > > > reduce > > > > > > > > > >> >> scope. > > > > > > > > > >> >> >>> > > > > > > > > > > > > >> >> >>> > > > Please let me know what you think. > > > > > > > > > >> >> >>> > > > > > > > > > > > > >> >> >>> > > > > > > > > > > > > >> >> >>> > > > 1. A REST API to paginate through table > > > > metrics. > > > > > > > This > > > > > > > > > >> will be > > > > > > > > > >> >> >>> > non-IRC > > > > > > > > > >> >> >>> > > > standard addition. > > > > > > > > > >> >> >>> > > > 2. Utilities for managing old records, > > > should > > > > be > > > > > > > > common > > > > > > > > > >> with > > > > > > > > > >> >> >>> events. > > > > > > > > > >> >> >>> > > > There was some discussion that it belongs > to > > > the > > > > > CLI. > > > > > > > > > >> >> >>> > > > 3. Separate datasource (metrics, events, > > > even > > > > > > other > > > > > > > > > >> tables?). > > > > > > > > > >> >> >>> > > > > > > > > > > > > >> >> >>> > > > > > > > > > > > > >> >> >>> > > > Anything else? > > > > > > > > > >> >> >>> > > > > > > > > > > > > >> >> >>> > > > - > > > > > > > > > >> >> >>> > > > Anand > > > > > > > > > >> >> >>> > > > > > > > > > > > > >> >> >>> > > > > > > > > > > > > >> >> >>> > > > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > > > > > > > > > > >> >> >>> > > > > > > > > > >> >> >> > > > > > > > > > >> >> > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
