Hi folks, Thanks again for the discussion today. I updated the sync doc with notes from the third metrics architecture sync:
https://docs.google.com/document/d/100h7c4damrUzVuquYbBHM0EvA4LSWuW2IT2dN_7nYVA/edit?tab=t.k96s2xyqr5u1 A few highlights from today’s discussion: - We clarified that Iceberg metrics reporting can be interpreted either as sync or async handling from Polaris’s perspective, so Polaris should stay flexible as a platform instead of baking one handling model into the REST API behavior. - We aligned that the built-in (battery) behavior for metrics emitting can stay simple: no-op/log-only is enough as the default. - Durable metrics persistence should be treated as an implementation of the metrics reporting path, not as the core SPI boundary. - The existing durable metrics work can be reviewed as a reference implementation of the reporting SPI, with persistence-related logic kept self-contained under a metrics durable implementation module rather than scattered through core entity persistence. - Dashboard/insights remains a real use case, but we agreed to keep it separate from the core metrics intake discussion for now. I also did a quick source check after the meeting to make sure we are describing the current state accurately. Current state: - Polaris already has a metrics reporting hook: PolarisMetricsReporter. - The default implementation is DefaultMetricsReporter, selected by polaris.iceberg-metrics.reporting.type=default. It is log-only, effectively quiet unless metrics logging is enabled. - There is also a PersistingMetricsReporter, selected by polaris.iceberg-metrics.reporting.type=persisting, which converts Iceberg scan/commit reports into Polaris metrics records and writes through PolarisMetricsManager -> MetricsPersistence. - MetricsPersistence currently exists in the persistence layer and is inherited by BasePersistence. My read from the discussion is that the target boundary should not be MetricsPersistence as inherited by BasePersistence. That path should be decomposed as durable implementation detail (taken care of by https://github.com/apache/polaris/pull/4397). The stable extension point should instead be the metrics reporting/emitting boundary. Concretely, I think the next proposal should be shaped like this: 1. Define the proper metrics reporting SPI at an optional Iceberg metrics extension API layer. Example direction: public interface IcebergMetricsReporter { void reportMetric(IcebergMetricsReportContext context, MetricsReport report); } The context would carry the small Polaris-resolved envelope: catalog/table identity, report type, received timestamp, and request/trace context if available. The raw Iceberg MetricsReport remains the payload. 2. Keep runtime/service as the REST ingestion and wiring layer. The REST handler still resolves the table and performs authz before accepting the report. After that, it calls the selected reporting implementation. 3. Keep the battery default no-op/log-only. This preserves an out-of-box safe default and avoids requiring a durable metrics store for every Polaris deployment. 4. Move durable JDBC metrics into a self-contained implementation of the reporting SPI. That implementation can own its schema, bootstrap, retention, and read API support. It should not define the core reporting SPI boundary, and it should not require metrics persistence to remain inherited from BasePersistence. 5. Treat async/event-backed handling as another implementation of the same reporting SPI. For example, an event-backed reporter could enqueue a metrics event and let listeners handle durable storage or other sinks. If we later need a replaceable queue engine, that seems like a shared event/metrics substrate topic rather than a metrics-only requirement. This framing lets us keep the REST metrics endpoint simple, preserve the current default behavior, support durable metrics users, and still leave room for async/event-backed or external-queue-based implementations. I think the main follow-up questions are: - Do folks agree that the stable SPI boundary should be metrics reporting/emitting, not metrics persistence? - Does an optional Iceberg metrics extension API layer sound like the right home for this SPI? - Should the current durable metrics work be reframed as a durable JDBC reference implementation of that SPI? - What is the smallest PR sequence to get there without blocking the current metrics work unnecessarily? Thanks, -ej On Fri, May 15, 2026 at 5:16 PM Dmitri Bourlatchkov <[email protected]> wrote: > Hi JB, > > Could you set up another meeting, please? Same time on Wednesday as last > time... I hope it works for everyone. > > Cheers, > Dmitri. > > On Fri, May 15, 2026 at 8:06 PM Yufei Gu <[email protected]> wrote: > > > +1 on another sync call next week. > > > > Yufei > > > > > > On Fri, May 15, 2026 at 4:52 PM Dmitri Bourlatchkov <[email protected]> > > wrote: > > > > > Hi All, > > > > > > WDYT about another sync call next week? > > > > > > Thanks, > > > Dmitri. > > > > > > On Wed, May 6, 2026 at 5:29 PM Dmitri Bourlatchkov <[email protected]> > > > wrote: > > > > > > > Hi EJ, > > > > > > > > Thanks for the summary! It covers what we discussed in the meeting > very > > > > well, IMHO. > > > > > > > > Looking forward to concrete PRs :) > > > > > > > > Cheers, > > > > Dmitri. > > > > > > > > On Wed, May 6, 2026 at 5:08 PM EJ Wang < > [email protected] > > > > > > > wrote: > > > > > > > >> Hi folks, > > > >> > > > >> We had a community sync earlier, thanks JB for scheduling it. Notes > > from > > > >> the first metrics architecture sync (May 6, 10-11am PT). Discussion > > doc > > > >> with per-section status: > > > >> > > > >> > > > > > > https://docs.google.com/document/d/100h7c4damrUzVuquYbBHM0EvA4LSWuW2IT2dN_7nYVA/edit?tab=t.0 > > > >> > > > >> *The meeting covered both topics from the doc. Direction-level > > alignment > > > >> was reached on the headline pieces; details remain for PR review or > > > >> follow-up sessions.* > > > >> > > > >> *Topic 1 — Persistence schema redesign* > > > >> Idea-level alignment on consolidating per-type tables > > > >> (scan_metrics_report, > > > >> commit_metrics_report) into a single metrics_report table. The > > > motivating > > > >> cost is the surface area added by every new metric type today: new > > > table, > > > >> SPI method, record class, model, converter, schema migration. > > > >> > > > >> Most schema details are deferred to the schema PR. A few specific > > points > > > >> came up: > > > >> • metric_schema_version: Yufei prefers dropping it, since there is > > no > > > >> spec-level concept of metrics versioning today and it is hard to > > define > > > >> unilaterally. Robert prefers keeping it, given IRC v2 is coming and > > the > > > >> schema should be considered against its likely shape; Robert also > > raised > > > >> how to differentiate various payload formats if any. EJ's read is > that > > > >> this > > > >> is a two-way-door decision. We can start without the field, and if > IRC > > > v2 > > > >> changes the shape we would likely roll a corresponding new schema > > > anyway, > > > >> which is not particularly costly. > > > >> • Payload format: Robert pointed out that future formats beyond > JSON > > > may > > > >> be worth supporting. The exact shape is deferred to the schema > > > discussion. > > > >> • Partition strategy: Anand suggested monthly partitioning based > on > > > his > > > >> experience as potentially helpful at scale. > > > >> > > > >> *Topic 2 — Where metrics ingestion and storage belong* > > > >> Idea-level alignment that metrics should be a separated SPI from the > > > >> entity > > > >> persistence stack. Two reasons surfaced: (a) workloads and > capability > > > >> requirements diverge enough that coupling them creates artificial > > > >> constraints, and (b) admin experience improves when metrics has its > > own > > > >> bootstrap, retention, and lifecycle. Dmitri noted Polaris being a > > > platform > > > >> should have the flexibility to support different persistence > backends > > > per > > > >> concern, and pointed to a concrete next step of separating the JDBC > > > >> bootstrap for metrics from the metastore bootstrap. Robert proposed > an > > > >> additional UX extension: detect an unbootstrapped metrics store on > > first > > > >> use and auto-bootstrap rather than requiring an explicit manual > > > bootstrap > > > >> step. > > > >> The meeting also confirmed that Polaris metrics can start small and > > stay > > > >> Iceberg-focused. Naming and persistence schema can lean > > > Iceberg-specific. > > > >> If a future expansion to generic-table metrics or operational > metrics > > > >> arrives, an abstraction layer can be built on top of the Iceberg > > metrics > > > >> reporter at that point. Robert remains on the fence and would prefer > > > >> something more generic but did not block the direction; Dmitri's > read > > > was > > > >> that the proposed framework already has enough flexibility to absorb > > > >> future > > > >> expansion. > > > >> > > > >> The Trade-offs and Proposed structure sections in the doc were not > > > >> reviewed > > > >> in detail. They remain open for either the next sync or PR review. > > > >> > > > >> *Cross-cutting alignment — battery-included plus pluggable* > > > >> A common philosophy emerged from the discussion. EJ summarized it > as: > > > >> Polaris should provide a battery-included UX for beginners and the > > > >> flexibility for advanced users to swap the included battery for > > > something > > > >> more powerful or tailored to their use case. The SPI design needs to > > > >> enable > > > >> both. > > > >> > > > >> The inputs that shaped this framing: > > > >> • Anand described how his team uses the current metrics > persistence > > > >> (three metrics consumers in v1.4). > > > >> • Yufei raised Grafana and dashboard integrations as a destination > > use > > > >> case beyond the default. > > > >> • Robert called out that the current design is more JDBC-focused. > > > >> > > > >> Two concrete instances: > > > >> • Async metrics intake: Yufei's initial position was that async > > should > > > >> largely live on the producer side and there is not much Polaris can > > do. > > > >> Robert suggested a Polaris-side default is doable via Vert.x. Dmitri > > > >> agreed > > > >> the direction is worth exploring. The meeting converged on a > > > >> battery-included default (likely Vert.x-backed) with an SPI shape > that > > > >> lets > > > >> power users route to a more scalable backend (k8s-hosted queue, AWS > > SQS, > > > >> etc.). > > > >> • Pluggable destinations: combining Yufei's dashboard use case > with > > > >> Robert's JDBC-focused call-out, the meeting agreed the SPI should be > > > >> structured for multiple sinks so integrations become impl choices > > rather > > > >> than architectural changes. > > > >> > > > >> The battery-included default is most likely to use the existing > > > >> JDBC-backed > > > >> approach. > > > >> > > > >> *Direction (idea-level alignment)* > > > >> • Single metrics_report table consolidating per-type metrics, > > > replacing > > > >> scan_metrics_report and commit_metrics_report > > > >> • Iceberg-focused naming and schema for now, revisit if > > generic-table > > > or > > > >> operational metrics arrive > > > >> • Metrics persistence as a separated SPI, not on BasePersistence > > > >> • Bootstrap path separated for metrics, independent of metastore > > > >> bootstrap > > > >> • "Battery-included plus pluggable" as the SPI design philosophy > > > >> > > > >> *Open items* > > > >> • Schema details: metric_schema_version, payload format, IRC v2 > > > >> forward-compat shape > > > >> • SPI design details — full review either in the next sync or in > the > > > >> corresponding PR > > > >> • Schema refactor PR ownership > > > >> > > > >> *Action items* > > > >> • EJ to take a first stab at the SPI design and potentially > partner > > > with > > > >> Anand to incorporate the lessons learned from the existing reporter > > and > > > >> persistence work. > > > >> • Schema refactor PR ownership is not yet decided. If anyone is > > > >> interested in driving it, reply on this thread. > > > >> • JB to schedule the next sync, tentatively in two weeks. > > > >> > > > >> -ej > > > >> > > > >> On Mon, Apr 27, 2026 at 3:07 PM EJ Wang < > > [email protected] > > > > > > > >> wrote: > > > >> > > > >> > Thanks Yufei for the +1. > > > >> > > > > >> > JB, could you help add a biweekly metrics architecture sync to the > > > >> Polaris > > > >> > community calendar? I'm thinking Thursdays at 9-10am PT, on the > > > >> off-weeks > > > >> > from the community meeting (starting May 7), 60 minutes. > > > >> > > > > >> > Here's a rough agenda to work through over the first few sessions, > > > >> grouped > > > >> > by priority: > > > >> > > > > >> > *First: foundational direction* > > > >> > > > > >> > 1. MetricsPersistence: public SPI or internal implementation > > detail? > > > >> > • Marked @Beta, javadoc calls it a "Service Provider > > Interface", > > > >> but > > > >> > only one consumer (JdbcBasePersistenceImpl), lives on > > BasePersistence. > > > >> If > > > >> > demoted to a private helper inside a persisting reporter impl, > most > > > >> > downstream design decisions become implementation details rather > > than > > > >> > contract questions. > > > >> > > > > >> > 2. Persistence schema redesign > > > >> > • Current two-table layout (scan_metrics_report, > > > >> > commit_metrics_report) with ~25 flattened columns each. Every new > > > metric > > > >> > type requires a new table, SPI method, record class, model, > > converter, > > > >> and > > > >> > schema migration. Direction to explore: single table with > > metric_type > > > >> enum, > > > >> > schema_version, and JSON payload column. > > > >> > > > > >> > *Second: design details once direction is set* > > > >> > > > > >> > 3. Partition key strategy > > > >> > • Single-table design means scan metrics at scale will have > > high > > > >> > write concurrency per table. Schema needs to expose enough > structure > > > for > > > >> > backends to shard by entity or time range. > > > >> > > > > >> > 4. Read/write path consistency > > > >> > • Writes go through PolarisMetricsManager on > MetaStoreManager. > > > >> Reads > > > >> > bypass MetaStoreManager and go straight to BasePersistence, > > excluding > > > >> > non-JDBC backends from the read API. > > > >> > > > > >> > *Third: cleanup and alignment* > > > >> > > > > >> > 5. PolarisMetricsReporter naming > > > >> > • Only handles IRC (ScanReport/CommitReport), doesn't cover > > > generic > > > >> > tables or operational metrics. Name is broader than scope. > > > >> > > > > >> > 6. PolarisMetricsManager facade passthrough > > > >> > • Entire default method is > > > >> callCtx.getMetaStore().writeScanReport(). > > > >> > Zero logic, passes Level 1 straight through to Level 3. Same > > > >> anti-pattern > > > >> > as PolarisEventManager. > > > >> > > > > >> > 7. Iceberg community alignment > > > >> > • Payload-type extension needs discussion on dev@iceberg. > > > >> obelix74's > > > >> > Feb thread got zero replies. Needs a committer voice. > > > >> > > > > >> > Lets confirm prioritization in the first session. > > > >> > > > > >> > -ej > > > >> > > > > >> > On Tue, Apr 21, 2026 at 3:18 PM Yufei Gu <[email protected]> > > > wrote: > > > >> > > > > >> >> Thanks everyone for continuing to drive this forward. I agree > that > > > the > > > >> >> problem is getting complex enough that a more structured > discussion > > > >> would > > > >> >> help. > > > >> >> > > > >> >> +1 on setting up a biweekly sync for the metrics architecture. > I’m > > > >> happy > > > >> >> to > > > >> >> join. > > > >> >> > > > >> >> Yufei > > > >> >> > > > >> >> > > > >> >> On Tue, Apr 21, 2026 at 2:34 PM EJ Wang < > > > >> [email protected]> > > > >> >> wrote: > > > >> >> > > > >> >> > Also, I've been looking more closely at the *persistence schema > > in > > > >> the > > > >> >> > current metrics work*, and I think there's a structural > rigidity > > > >> problem > > > >> >> > worth raising before the shape gets locked in. > > > >> >> > > > > >> >> > Right now we have two separate tables (scan_metrics_report and > > > >> >> > commit_metrics_report), each with ~25 flattened columns that > > > directly > > > >> >> > mirror the Iceberg report fields. The SPI follows the same > split: > > > >> >> > writeScanReport and writeCommitReport as separate methods, with > > > >> per-type > > > >> >> > record classes, converters, and model objects. *The practical > > cost: > > > >> >> > adding a new metric type (operational metrics, for example) > > > requires > > > >> a > > > >> >> new > > > >> >> > table, a new SPI method, a new record class, a new model > class, a > > > new > > > >> >> > converter branch, and a schema migration*. That's a lot of > > surface > > > >> area > > > >> >> > for what should be "one more kind of metric." > > > >> >> > > > > >> >> > *My bias* would be toward a single metrics table with *a typed > > JSON > > > >> >> > payload*. Something like: metric_type (enum), entity_id, > > > >> >> > table_identifier, snapshot_id (nullable), received_ts, > > > >> schema_version, > > > >> >> and > > > >> >> > a payload column for the metric-specific data. The metric_type > + > > > >> >> > schema_version pair gives us a forward-compatible contract for > > the > > > >> >> payload > > > >> >> > shape. Adding a new metric type becomes an enum value and a > > payload > > > >> >> schema, > > > >> >> > not a schema migration. One thing I think we need to be > > deliberate > > > >> >> about is > > > >> >> > the partition key design. If all metric types land in one > table, > > > scan > > > >> >> > metrics at scale (high concurrency, high frequency across many > > > >> tables) > > > >> >> > could easily create hot partitions. We'd want the persistence > > layer > > > >> to > > > >> >> be > > > >> >> > able to shard by entity or time range, and that means the > logical > > > >> schema > > > >> >> > needs to expose enough structure for backends to partition on. > I > > > >> don't > > > >> >> > think the current flattened layout gives us that. > > > >> >> > > > > >> >> > This is getting complex enough that I don't think ad-hoc PR/ML > > > >> threads > > > >> >> > will converge well. *Would people be open to a biweekly sync > for > > > >> metrics > > > >> >> > architecture?* I think 30 minutes every two weeks with > interested > > > >> >> parties > > > >> >> > would be enough to work through the schema, SPI shape, and read > > API > > > >> >> design > > > >> >> > together. Happy to help set that up. > > > >> >> > > > > >> >> > -ej > > > >> >> > > > > >> >> > On Mon, Apr 20, 2026 at 2:19 PM EJ Wang < > > > >> [email protected] > > > >> >> > > > > >> >> > wrote: > > > >> >> > > > > >> >> >> Reviewed #4115, left a comment on the code organization side. > > > >> >> >> > > > >> >> >> One thing stood out: the metrics write path enters through > > > >> >> >> PolarisMetricsManager on MetaStoreManager, but the new read > path > > > >> >> bypasses > > > >> >> >> MetaStoreManager entirely and goes straight to BasePersistence > > via > > > >> >> >> callContext.getMetaStore(). That means the read API only works > > for > > > >> >> backends > > > >> >> >> that implement BasePersistence. NoSQL and remote backends > can't > > > >> >> participate. > > > >> >> >> > > > >> >> >> Stepping back, I think the metrics subsystem is growing into > > > >> something > > > >> >> >> real (write + read + REST API + AuthZ + pagination) *but the > > > >> >> persistence > > > >> >> >> side is split across two layers in a way that's hard to > > extend*. I > > > >> put > > > >> >> >> together two diagrams to show what I mean (my best effort). > > > >> >> >> > > > >> >> >> *Current state* (Diagram 1): three interfaces at three > different > > > >> >> levels. > > > >> >> >> The engine-facing SPI (PolarisMetricsReporter) is clean. But > > > >> >> >> PolarisMetricsManager on MetaStoreManager is a passthrough to > > > >> >> >> MetricsPersistence on BasePersistence. The @Beta annotation > and > > > SPI > > > >> >> javadoc > > > >> >> >> are on the BasePersistence layer, while the actual extension > > > points > > > >> >> >> (PolarisMetricsReporter, PolarisMetricsManager) carry no > > stability > > > >> >> >> annotation. The write path goes through the MetaStoreManager > > > layer, > > > >> the > > > >> >> >> read path doesn't. > > > >> >> >> > > > >> >> >> *What I envision* (Diagram 2): two SPIs at two levels. > > > >> >> >> PolarisMetricsReporter stays as the engine-facing SPI. > > > >> >> >> PolarisMetricsManager becomes the backend-facing SPI with both > > > write > > > >> >> and > > > >> >> >> read methods at the MetaStoreManager level, where any backend > > > (JDBC, > > > >> >> NoSQL, > > > >> >> >> remote) can implement them. MetricsPersistence on > > BasePersistence > > > >> goes > > > >> >> >> away. Where metrics actually land is an implementation detail, > > > not a > > > >> >> core > > > >> >> >> interface. > > > >> >> >> > > > >> >> >> *Minor naming thing*: PolarisMetricsReporter is broader than > > what > > > it > > > >> >> >> actually handles. It only accepts Iceberg REST Catalog metrics > > > >> >> (ScanReport, > > > >> >> >> CommitReport via MetricsReport). Generic table metrics or > > > >> operational > > > >> >> >> metrics aren't in scope. Not blocking, but worth noting if the > > > >> metrics > > > >> >> >> surface expands. > > > >> >> >> > > > >> >> >> *Rough sketch of how to get there*: > > > >> >> >> 1. Add read methods to PolarisMetricsManager > (listScanReports, > > > >> >> >> listCommitReports) with default no-op, same as the existing > > write > > > >> >> methods. > > > >> >> >> (Probably make PolarisMetricsManager more explicit on being > > > Iceberg > > > >> >> >> specific like package name or class name etc.) > > > >> >> >> 2. Wire MetricsReportsService through MetaStoreManager > instead > > > of > > > >> >> >> callContext.getMetaStore(). > > > >> >> >> 3. Extract metrics persistence from JdbcBasePersistenceImpl > > into > > > >> its > > > >> >> >> own class. That file carries ~7 responsibilities, metrics > being > > > one > > > >> of > > > >> >> them. > > > >> >> >> 4. Remove MetricsPersistence from BasePersistence. > > > >> >> >> > > > >> >> >> *None of this needs to happen in #4115. But if the direction > > makes > > > >> >> sense, > > > >> >> >> it would be good to align before the metrics surface grows > > > further. > > > >> >> Curious > > > >> >> >> what others think.* > > > >> >> >> > > > >> >> >> *My mental model note*: Level 1 MetaStoreManager; level 2 > > > >> transactional > > > >> >> >> persistence; level 3 base persistence > > > >> >> >> > > > >> >> >> Diagram 1 > > > >> >> >> < > > > >> >> > > > >> > > > > > > https://www.plantuml.com/plantuml/uml/bLHDR-Cs4BthLmpIYupw0zbkKQ1r3M-S7Bp8xhhM7WCOb3IM65EaGD9EX2RzxHrHb4CxRelwa4YSDu_lpOVcnZ9jzvM8BBS2uGjQpJC3dtHMSekPtMk44IpsMgEqa5XcCOhCZikQQLP1pR8TAp2n3ILhmZDP20m0fcIvUkAoW2qJXd9z1bpToO9BX3WXu0ucy5rpgGPNm0nW5_epUWtm2Ue3pn3kMOFQmKntGZW0BYtgBSi8k5A2QMwybJNMIbFiGSR9QZc4nUqIvikStF0jHprua5C-amge42aNt3R0f5JaaoivdV2Pkqbx4hee4ymOkBh5BTiB-_uIeGeo8zL8rPsPl4DktdEiK1jkB1NdZCRbrSTecDe_mlHbF0wvBmCkaOH5_S8a_TTTKI6-nmCAkEw4LpxsZ-LbYLKQFKMNOgf_wuM7_bV9gOer5SYMMksBSWXFcbi49KNZXNLicwfe3TETC7gPdPqI7uBcHMb1RSzYq34c6PDUM9mn8HRsUTZEiDBve3NjVZumBj0U7SS37mGO7vcwtiK-_pU7U7L_f-digo9YbhSwIfMRwIITKGXbxdIUTCGF1SeCJxloKsU-3k9ddRbX1eDq1q_fx1JbBGT0glVyXimDuP4TQ5qpCAmnGEj2s_6n5mtn1z-97-63itFQZLPO1Ev2tu_WF7Ju-VPc0Skg5bYXxBhkY1xpD7EM_7fyflSpIsqMgVth5xhVr4eQxWQ8enaSAJQSG16yFSDuJ798rrcXr_3n-lfdk7icQjEBmFujL7AodiP_Y4Z7-YxvtZNs4zMgpNTl6tF8sglyPsmqchrjvQ-m-aP94r-TwCA2Ka8upPJZwtvSpoYCXkYMZU2NXvRMBfq9P3i3Le4VAZUAlUZ_oPKsxPgY0Q_BSKLkyr9bhQhQrJjo_x3TPlIB0DPjnMfcIoYP0QaYw1a0fTKDr8fB6ntNuvmoL1ZGkXa69Njh43zf9GiGxHQrA_jDYWRSzF5--WmTVrN97_Sm8LbLUy_lGBmLanJjFkDlGkRqjA_4tm00 > > > >> >> > > > > >> >> >> : > > > >> >> >> > > > >> >> >> [image: image.png] > > > >> >> >> > > > >> >> >> Diagram 2 > > > >> >> >> < > > > >> >> > > > >> > > > > > > https://www.plantuml.com/plantuml/uml/VLLDR-8m4BtdLupO2sWBLVU8AaGB7AXAbssGzb896SS9RXqxjKqBwkv_tt7iV43fdaZYDpFlpRmnOsE9jhjSH9PRmM31hERKm8scMsuPjJlDe0yheZDc8RR4iYWoBrmMH9CS2a9VICPYUy1OZN0YCy5Q0BCbYNhdCeEK28En8G8wCvbnoQ0R8_05Bc6bkLIz3X03p1zzH7zR-9ZfDquPt9C3qoNCX2yV4G2NbkcKu5jdgGJHt0GbZwnG6i-UP3TUpk5gM6Ldqke350eZUqzoCft3U9xWHvxoa5-7K4nF1J46EbEMafsmdrCBbQ44gVggy18IZrn_ph5asd1ZiIKdQSgueZvjXrQFSFrdC3YN-nXmBacxbGiYyLVxLaBtdhqn0LSzdBDhqQtQoOJeGyad3z0lUqnYgpGB6Ns8oVyta00Dy_WnX0tIOZ8v6SYxHll1TrH6aejAik-mh-AphVFCwSUQqFypElag5QRGFDjQKEd96K1P8QP41c9TzA_IIQyvdAWyv_RSiS3skb0_EzDDkK2v5xWF6MiGFlvhpFLcD2Dq2pml14gaF67eQkmd8gulDoC4kSOu6KVpkvlUJg1RTbWISU40RdBUUS_9XfRZ2dwxm_SW8LYFISgm_MnlDQ6M9P1gbKEc4X-2pH_FvJCkCqm9pbVjD6LrwdLeOrDWfOaqc8Wh9BE85oNKxkNQ6o4yGRy_Eae0G_G8tZv81d3bHDB23WOdisohVr3nh_j6lbSjbNaLRTc8UgtPbAU1J_tygOfZX9DWEJeHDvYx-qmSi5FgNLPZwHrHcUsncGQ5-skhUclpE5fo4ounpFauYrUbkU6ccfnxMvitwag4IyerhTxj8In_Oj1bDO4pQru674loYrGlULHLEGCjwJJ8gDoVZR8MxO4BT3IzRvIcAQKezC6xpziGnTyImrfEGyJI_OcKfgtxIvnTqFEMS17L9Z-jsARN5FmTheP7HtSdtOMT0B4GY2FYHXxgQmMtj2bRqiLFGapiVe1_QVKDrkqXcm83aFEXnMYCZ-xlyHy > > > >> >> > > > > >> >> >> : > > > >> >> >> [image: image.png] > > > >> >> >> > > > >> >> >> -ej > > > >> >> >> > > > >> >> >> On Wed, Apr 15, 2026 at 8:22 AM Dmitri Bourlatchkov < > > > >> [email protected]> > > > >> >> >> wrote: > > > >> >> >> > > > >> >> >>> Hi All, > > > >> >> >>> > > > >> >> >>> Heads up: The current state of PR [4115] looks pretty solid > to > > > me. > > > >> I > > > >> >> >>> believe this PR is approaching a mergeable condition. > > > >> >> >>> > > > >> >> >>> Please post your reviews if you have any comments. > > > >> >> >>> > > > >> >> >>> [4115] https://github.com/apache/polaris/pull/4115 > > > >> >> >>> > > > >> >> >>> Thanks, > > > >> >> >>> Dmitri. > > > >> >> >>> > > > >> >> >>> On Tue, Mar 3, 2026 at 3:29 PM Anand Kumar Sankaran via dev < > > > >> >> >>> [email protected]> wrote: > > > >> >> >>> > > > >> >> >>> > Hi Yufei and Dmitri, > > > >> >> >>> > > > > >> >> >>> > Here is a proposal for the REST endpoints for metrics and > > > events. > > > >> >> >>> > > > > >> >> >>> > https://github.com/apache/polaris/pull/3924/changes > > > >> >> >>> > > > > >> >> >>> > I did not see any precursors for raising a PR for > proposals, > > so > > > >> >> trying > > > >> >> >>> > this. Please let me know what you think. > > > >> >> >>> > > > > >> >> >>> > - > > > >> >> >>> > Anand > > > >> >> >>> > > > > >> >> >>> > From: Anand Kumar Sankaran <[email protected]> > > > >> >> >>> > Date: Monday, March 2, 2026 at 10:25 AM > > > >> >> >>> > To: [email protected] <[email protected]> > > > >> >> >>> > Subject: Re: Polaris Telemetry and Audit Trail > > > >> >> >>> > > > > >> >> >>> > About the REST API, based on my use cases: > > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > 1. > > > >> >> >>> > I want to be able to query commit metrics to track files > > added > > > / > > > >> >> >>> removed > > > >> >> >>> > per commit, along with record counts. The ingestion > pipeline > > > that > > > >> >> >>> writes > > > >> >> >>> > this data is owned by us and we are guaranteed to write > this > > > >> >> >>> information > > > >> >> >>> > for each write. > > > >> >> >>> > 2. > > > >> >> >>> > I want to be able to query scan metrics for read. I > > understand > > > >> >> clients > > > >> >> >>> do > > > >> >> >>> > not fulfill this requirement. > > > >> >> >>> > 3. > > > >> >> >>> > I want to be able to query the events table (events are > > > >> persisted) - > > > >> >> >>> this > > > >> >> >>> > may supersede #2, I am not sure yet. > > > >> >> >>> > > > > >> >> >>> > All this information is in the JDBC based persistence model > > and > > > >> is > > > >> >> >>> > persisted in the metastore. I currently don’t have a need > to > > > >> query > > > >> >> >>> > prometheus or open telemetry. I do publish some events to > > > >> Prometheus > > > >> >> >>> and > > > >> >> >>> > they are forwarded to our dashboards elsewhere. > > > >> >> >>> > > > > >> >> >>> > About the CLI utilities, I meant the admin user utilities. > In > > > >> one of > > > >> >> >>> the > > > >> >> >>> > earliest drafts of my proposal, Prashant mentioned that the > > > >> metrics > > > >> >> >>> tables > > > >> >> >>> > can grow indefinitely and that a similar problem exists > with > > > the > > > >> >> events > > > >> >> >>> > table as well. We discussed that cleaning up of old records > > > from > > > >> >> both > > > >> >> >>> > metrics tables and events tables can be done via a CLI > > utility. > > > >> >> >>> > > > > >> >> >>> > I see that Yufei has covered the discussion about > > datasources. > > > >> >> >>> > > > > >> >> >>> > - > > > >> >> >>> > Anand > > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > From: Yufei Gu <[email protected]> > > > >> >> >>> > Date: Friday, February 27, 2026 at 9:54 PM > > > >> >> >>> > To: [email protected] <[email protected]> > > > >> >> >>> > Subject: Re: Polaris Telemetry and Audit Trail > > > >> >> >>> > > > > >> >> >>> > This Message Is From an External Sender > > > >> >> >>> > This message came from outside your organization. > > > >> >> >>> > Report Suspicious< > > > >> >> >>> > > > > >> >> >>> > > > >> >> > > > >> > > > > > > https://us-phishalarm-ewt.proofpoint.com/EWT/v1/Iz9xO38YGHZK!YhNDZABkHi1B699ote2uMwpOZw8i0QMCGO2Szc-HshuABGhGvwPJcymE6G2oUUxtS8xDkSrtGTPm_I3QnVDHoLMk50m9v8z_nZKTkd-bnVUbreF1u0WnfV_X5eYevZl_$ > > > >> >> >>> > > > > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > As I mentioned in > > > >> >> >>> > > > > >> >> >>> > > > >> >> > > > >> > > > > > > https://urldefense.com/v3/__https://github.com/apache/polaris/issues/3890__;!!Iz9xO38YGHZK!5EuyFFkk3vhRWVIRvQAWBSQfpJkTMA9HxugzDwXmN0LPPqhEFxYkFRGVhtb8AqUwXtDh2OplcMnbMDHKOxrvDU0$ > > > >> >> >>> , > > > >> >> >>> > supporting > > > >> >> >>> > multiple data sources is not a trivial change. I would > > strongly > > > >> >> >>> recommend > > > >> >> >>> > starting with a design document to carefully evaluate the > > > >> >> architectural > > > >> >> >>> > implications and long term impact. > > > >> >> >>> > > > > >> >> >>> > A REST endpoint to query metrics seems reasonable given the > > > >> current > > > >> >> >>> JDBC > > > >> >> >>> > based persistence model. That said, we may also consider > > > >> alternative > > > >> >> >>> > storage models. For example, if we later adopt a time > series > > > >> system > > > >> >> >>> such as > > > >> >> >>> > Prometheus to store metrics, the query model and access > > > patterns > > > >> >> would > > > >> >> >>> be > > > >> >> >>> > fundamentally different. Designing the REST API without > > > >> considering > > > >> >> >>> these > > > >> >> >>> > potential evolutions may limit flexibility. I'd suggest to > > > start > > > >> >> with > > > >> >> >>> the > > > >> >> >>> > use case. > > > >> >> >>> > > > > >> >> >>> > Yufei > > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > On Fri, Feb 27, 2026 at 3:42 PM Dmitri Bourlatchkov < > > > >> >> [email protected]> > > > >> >> >>> > wrote: > > > >> >> >>> > > > > >> >> >>> > > Hi Anand, > > > >> >> >>> > > > > > >> >> >>> > > Sharing my view... subject to discussion: > > > >> >> >>> > > > > > >> >> >>> > > 1. Adding non-IRC REST API to Polaris is perfectly fine. > > > >> >> >>> > > > > > >> >> >>> > > Figuring out specific endpoint URIs and payloads might > > > require > > > >> a > > > >> >> few > > > >> >> >>> > > roundtrips, so opening a separate thread for that might > be > > > >> best. > > > >> >> >>> > > Contributors commonly create Google Docs for new API > > > proposals > > > >> too > > > >> >> >>> (they > > > >> >> >>> > > fairly easy to update as the email discussion > progresses). > > > >> >> >>> > > > > > >> >> >>> > > There was a suggestion to try Markdown (with PRs) for > > > proposals > > > >> >> [1] > > > >> >> >>> ... > > > >> >> >>> > > feel free to give it a try if you are comfortable with > > that. > > > >> >> >>> > > > > > >> >> >>> > > 2. Could you clarify whether you mean end user utilities > or > > > >> admin > > > >> >> >>> user > > > >> >> >>> > > utilities? In the latter case those might be more > suitable > > > for > > > >> the > > > >> >> >>> Admin > > > >> >> >>> > > CLI (java) not the Python CLI, IMHO. > > > >> >> >>> > > > > > >> >> >>> > > Why would these utilities be common with events? IMHO, > > event > > > >> use > > > >> >> >>> cases > > > >> >> >>> > are > > > >> >> >>> > > distinct from scan/commit metrics. > > > >> >> >>> > > > > > >> >> >>> > > 3. I'd prefer separating metrics persistence from > MetaStore > > > >> >> >>> persistence > > > >> >> >>> > at > > > >> >> >>> > > the code level, so that they could be mixed and matched > > > >> >> >>> independently. > > > >> >> >>> > The > > > >> >> >>> > > separate datasource question will become a non-issue with > > > that > > > >> >> >>> approach, > > > >> >> >>> > I > > > >> >> >>> > > guess. > > > >> >> >>> > > > > > >> >> >>> > > The rationale for separating scan metrics and metastore > > > >> >> persistence > > > >> >> >>> is > > > >> >> >>> > that > > > >> >> >>> > > "cascading deletes" between them are hardly ever > required. > > > >> >> >>> Furthermore, > > > >> >> >>> > the > > > >> >> >>> > > data and query patterns are very different so different > > > >> >> technologies > > > >> >> >>> > might > > > >> >> >>> > > be beneficial in each case. > > > >> >> >>> > > > > > >> >> >>> > > [1] > > > >> >> >>> > > > > >> >> >>> > > > >> >> > > > >> > > > > > > https://urldefense.com/v3/__https://lists.apache.org/thread/yto2wp982t43h1mqjwnslswhws5z47cy__;!!Iz9xO38YGHZK!5EuyFFkk3vhRWVIRvQAWBSQfpJkTMA9HxugzDwXmN0LPPqhEFxYkFRGVhtb8AqUwXtDh2OplcMnbMDHKxYDakNU$ > > > >> >> >>> > > > > > >> >> >>> > > Cheers, > > > >> >> >>> > > Dmitri. > > > >> >> >>> > > > > > >> >> >>> > > On Fri, Feb 27, 2026 at 6:19 PM Anand Kumar Sankaran via > > dev > > > < > > > >> >> >>> > > [email protected]> wrote: > > > >> >> >>> > > > > > >> >> >>> > > > Thanks all. This PR is merged now. > > > >> >> >>> > > > > > > >> >> >>> > > > Here are the follow-up features / work needed. These > > were > > > >> all > > > >> >> >>> part of > > > >> >> >>> > > the > > > >> >> >>> > > > merged PR at some point in time and were removed to > > reduce > > > >> >> scope. > > > >> >> >>> > > > > > > >> >> >>> > > > Please let me know what you think. > > > >> >> >>> > > > > > > >> >> >>> > > > > > > >> >> >>> > > > 1. A REST API to paginate through table metrics. > This > > > >> will be > > > >> >> >>> > non-IRC > > > >> >> >>> > > > standard addition. > > > >> >> >>> > > > 2. Utilities for managing old records, should be > > common > > > >> with > > > >> >> >>> events. > > > >> >> >>> > > > There was some discussion that it belongs to the CLI. > > > >> >> >>> > > > 3. Separate datasource (metrics, events, even other > > > >> tables?). > > > >> >> >>> > > > > > > >> >> >>> > > > > > > >> >> >>> > > > Anything else? > > > >> >> >>> > > > > > > >> >> >>> > > > - > > > >> >> >>> > > > Anand > > > >> >> >>> > > > > > > >> >> >>> > > > > > > >> >> >>> > > > > > >> >> >>> > > > > >> >> >>> > > > > >> >> >>> > > > >> >> >> > > > >> >> > > > >> > > > > >> > > > > > > > > > >
