Thanks for the reply, Pierre and Oleg!
As I mentioned earlier, we could distinguish metrics into two categories in
terms of how we collect them:
1. Handled directly by Polaris, e.g., metrics derived from the
metadata.json file or ingested through the IRC metrics endpoint
(/v1/{prefix}/namespaces/{namespace}/tables/{table}/metrics).
2. Requiring external services, e.g., file size distribution and
partition metrics.
This proposal seems to address both. For category (2), we’ll need a clear
design for how it integrates with an external service. That should cover
aspects such as workload life cycle management (triggering, state control,
etc.). That said, I think it would be reasonable to narrow the initial
scope to category (1). Could we clarify that in the proposal?
On persistence, I believe the most critical part of the design is the
schema. Once the schema is defined, the SPI details could be derived
relatively easily. One important factor we shouldn’t overlook is the type
of database we want to leverage: a time series database (TSDB) or a
general-purpose OLTP database. This choice will heavily influence schema
design. For instance, a TSDB schema must include a timestamp, metric name,
and dimensions, while an OLTP schema is more generic and flexible. The
choice also affects the SPI design. For example, aggregations, rollups, and
sliding windows are first-class operations in TSDB, while joins are
supported better in OLTP. Could we add this consideration to the design
document?
I'm also open to a dedicated discussion if needed. WDYT?
Yufei
On Tue, Sep 30, 2025 at 3:00 PM Oleg Soloviov <[email protected]> wrote:
> Hello!
>
> Just want to add my 2 cents here.
> Using external engines to handle heavy I/O operations looks well justified.
> On the other hand it was mentioned that there are metrics that could be
> handled by Polaris itself, though someone needs to control when they are
> calculated, pushed to storage etc.
>
> Maybe we could use an approach in between the 1st and 2nd version of
> Pierre's proposal:
> - Polaris controls which metrics are calculated and when (benefitting from
> event listeners).
> - Polaris delegates computation to external engines if needed (SPI or
> API?).
> - Polaris pushes metrics to persistent storage via SPI.
> - Polaris provides Open API to request metrics, force refresh them;
> probably we could still allow to push externally calculated metrics as
> well.
>
> This way we will have a more centralized setup for different kinds of
> metrics, better control the granularity of incremental updates (we probably
> will not want to update metrics after each small commit). Also it will
> provide some flexibility, e.g. there were some proposals in the Iceberg
> community to enrich metadata with some metrics like histograms, so Polaris
> could choose to calculate internally vs externally if those metrics are
> available in metadata.
> On the downside, of course, such a solution will complicate Polaris'
> runtime, but imo it is still worth considering.
>
> Oleg
>
> On Mon, Sep 29, 2025 at 4:48 PM Pierre Laporte <[email protected]>
> wrote:
>
> > Hi Yufei, thanks for the feedback
> >
> > Just to confirm, this is not a Polaris event listener but the Iceberg
> Event
> > > REST endpoint(WIP), right? If we are using the Polaris event listener,
> we
> > > still have to figure out the protocol between the delegation service
> > > clients and servers, which are described in William's doc.
> > >
> >
> > The proposal does not include any sort of triggering system. So there is
> > no single answer to your question. I was merely trying to explore
> possible
> > implementation ideas. But keep in mind that this can come at a later
> time,
> > as we first have to define how Polaris defines operational metrics and
> deal
> > with them, before we can consider how external systems could integrate
> > them.
> >
> > To be clear, I think using the Iceberg Event REST endpoint is a good
> idea,
> > > it decouples the external service nicely, but we may have to wait for a
> > > while, as it's still WIP in the Iceberg community.
> > >
> >
> > Exactly. I do not recommend adding a dependency between this proposal
> and
> > other proposals, unless those are strictly necessary.
> >
> >
> > > Other than that, the SPI interface design seems missing in the doc.
> > That's
> > > an essential part of the metrics persistence. I think we will need more
> > > interface details to move forward.
> > >
> >
> > Note that the SPI cannot define how metrics are persisted. It is an
> > interface that should be extended so that metrics are persisted against a
> > certain database, and using a certain format. I do not think Polaris
> > should force a certain storage system for operational metrics.
> >
> > Could you list the information you would like to see added to the
> > document? I am having difficulties understanding the ask.
> >
> > --
> >
> > Pierre
> >
>