Hey Pierre,
Thank you for taking a look at my recommendation, I think there are
additional benefits of these Iceberg metrics for example ScanMetrics we
literally get the expression that was applied to the query which
essentially can help us get which subset of data is actively queried and
hence run compaction on it.
People already build their telemetry and triggers based on these reports,
since this is something iceberg natively provides.

That being said, I am not against the idea of collecting telemetry (I think
we would require an auxiliary compute for doing this, though), but I wanted
to highlight something very obvious that Polaris might be ignoring and
introduce a new one as I didn't find the reference in the proposal !
SideNode catalogs such as Apache Gravitino already supports this [PR
<https://github.com/apache/gravitino/pull/1164/files>]

>  cannot find anything in the community Slack about people requesting
Polaris to support Iceberg Metrics, since we are on the Free plan

unfortunately I don't have access to message either, but the context was a
Polaris user was asking why isn't Persisting the report which is sent to
`/report` and how can they get that report, to which i suggested them to
write their own custom metric reporter which rather than hitting the
/report endpoint of Polaris it just dumps data to a DB which their
downstream maintainer services can use.

Looking forward to discussing this more !

Best,
Prashant Singh


On Fri, Sep 5, 2025 at 5:03 AM Pierre Laporte <[email protected]> wrote:

> Thanks for the feedback, Prashant
>
> As far as I can tell, we could use the Iceberg Metrics Reporting for only 3
> operational metrics:
> * Total number of files in a table (using the CommitReport)
> * Total number of reads (the number of ScanReport)
> * Total number of writes (the number of CommitReport)
>
> I don't think the other operational metrics could be computed from the
> Iceberg Metrics.  So we would still need to rely on the Events API.  And I
> am wondering whether we should really have two triggers to compute metrics,
> considering that with the Events API, we would be able to cover all
> documented cases.
>
> That being said, I suspect that there could be other operational metrics
> that are missing from the design document.  Typically metrics that would
> require the use of the Iceberg Metrics Reporting.  Problem: I cannot find
> anything in the community Slack about people requesting Polaris to support
> Iceberg Metrics, since we are on the Free plan.  Do you happen to remember
> what was discussed?
>
> --
>
> Pierre
>
>
> On Thu, Sep 4, 2025 at 6:27 PM Prashant Singh
> <[email protected]> wrote:
>
> > Thank you for the proposal Pierre !
> > I think having metrics on the entities that Polaris is really helpful for
> > telemetry as well making decisions on when and what partitions to run
> > compactions.
> > Iceberg already emits the metric from client end to the rest server
> > via RestMetricsReporter
> > <
> >
> https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/RESTMetricsReporter.java#L60
> > >
> > and
> > things like ScanMetrics
> > <
> >
> https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/metrics/ScanMetrics.java
> > >
> > /
> > CommitMetrics
> > <
> >
> https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/metrics/CommitMetrics.java
> > >
> > are already available but at this point we don't persist them and hence
> > they are lost, there has been a request for this in Polaris slack too !
> > My recommendations would start from here !
> >
> > Best,
> > Prashant Singh
> >
> > On Thu, Sep 4, 2025 at 8:41 AM Pierre Laporte <[email protected]>
> > wrote:
> >
> > > Hi folks,
> > >
> > > I would like to propose the addition of a component to Polaris that
> would
> > > build and maintain operational metrics for the Data Lake tables and
> > views.
> > > The main idea is that, if those metrics can be shared across multiple
> > Table
> > > Management Services and/or other external services, then it would make
> > > sense to have those metrics served by Polaris.
> > >
> > > I believe this feature would nor only add value to Polaris but also
> > further
> > > advance it as central point in the Data Lake.
> > >
> > > The detailed proposal document is here:
> > >
> > >
> >
> https://docs.google.com/document/d/1yHvLwqNVD3Z84KYcc_m3c4M8bMijTXg9iP1CR0JXxCc
> > >
> > > Please let me know if you have any feedback or comment !
> > >
> > > Thanks
> > > --
> > >
> > > Pierre
> > >
> >
>

Reply via email to