I was definitely not aware of that endpoint, so thanks a lot for bringing that up ! I am glad there is appetite for even more metrics :-)
One thing that I was trying to be mindful about is the extra load that the MetaStore will have to handle. Typically, assuming ~10 metrics per table, this could become already quite substantial for large Data Lakes. And given that some requests imply to have a couple of metrics per-partition, that scales up even more. That being said, I am definitely in favor of recording the metrics sent by Iceberg clients in a database. If this can add more value to Polaris, by all means I am in. I wonder how we could best anticipate the volume that this could result to. For example, considering that metrics have a different lifecycle than table metadata, then they should probably not be cached in the EntityCache at all. Otherwise, they could easily thrash the EntityCache. And we might also want to store them in a separate table, or even database (?), depending on other constraint (e.g. leveraging a TTL for automatic cleanup). -- Pierre On Fri, Sep 5, 2025 at 8:05 PM Prashant Singh <[email protected]> wrote: > Hey Pierre, > Thank you for taking a look at my recommendation, I think there are > additional benefits of these Iceberg metrics for example ScanMetrics we > literally get the expression that was applied to the query which > essentially can help us get which subset of data is actively queried and > hence run compaction on it. > People already build their telemetry and triggers based on these reports, > since this is something iceberg natively provides. > > That being said, I am not against the idea of collecting telemetry (I think > we would require an auxiliary compute for doing this, though), but I wanted > to highlight something very obvious that Polaris might be ignoring and > introduce a new one as I didn't find the reference in the proposal ! > SideNode catalogs such as Apache Gravitino already supports this [PR > <https://github.com/apache/gravitino/pull/1164/files>] > > > cannot find anything in the community Slack about people requesting > Polaris to support Iceberg Metrics, since we are on the Free plan > > unfortunately I don't have access to message either, but the context was a > Polaris user was asking why isn't Persisting the report which is sent to > `/report` and how can they get that report, to which i suggested them to > write their own custom metric reporter which rather than hitting the > /report endpoint of Polaris it just dumps data to a DB which their > downstream maintainer services can use. > > Looking forward to discussing this more ! > > Best, > Prashant Singh > > > On Fri, Sep 5, 2025 at 5:03 AM Pierre Laporte <[email protected]> > wrote: > > > Thanks for the feedback, Prashant > > > > As far as I can tell, we could use the Iceberg Metrics Reporting for > only 3 > > operational metrics: > > * Total number of files in a table (using the CommitReport) > > * Total number of reads (the number of ScanReport) > > * Total number of writes (the number of CommitReport) > > > > I don't think the other operational metrics could be computed from the > > Iceberg Metrics. So we would still need to rely on the Events API. And > I > > am wondering whether we should really have two triggers to compute > metrics, > > considering that with the Events API, we would be able to cover all > > documented cases. > > > > That being said, I suspect that there could be other operational metrics > > that are missing from the design document. Typically metrics that would > > require the use of the Iceberg Metrics Reporting. Problem: I cannot find > > anything in the community Slack about people requesting Polaris to > support > > Iceberg Metrics, since we are on the Free plan. Do you happen to > remember > > what was discussed? > > > > -- > > > > Pierre > > > > > > On Thu, Sep 4, 2025 at 6:27 PM Prashant Singh > > <[email protected]> wrote: > > > > > Thank you for the proposal Pierre ! > > > I think having metrics on the entities that Polaris is really helpful > for > > > telemetry as well making decisions on when and what partitions to run > > > compactions. > > > Iceberg already emits the metric from client end to the rest server > > > via RestMetricsReporter > > > < > > > > > > https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/RESTMetricsReporter.java#L60 > > > > > > > and > > > things like ScanMetrics > > > < > > > > > > https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/metrics/ScanMetrics.java > > > > > > > / > > > CommitMetrics > > > < > > > > > > https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/metrics/CommitMetrics.java > > > > > > > are already available but at this point we don't persist them and hence > > > they are lost, there has been a request for this in Polaris slack too ! > > > My recommendations would start from here ! > > > > > > Best, > > > Prashant Singh > > > > > > On Thu, Sep 4, 2025 at 8:41 AM Pierre Laporte <[email protected]> > > > wrote: > > > > > > > Hi folks, > > > > > > > > I would like to propose the addition of a component to Polaris that > > would > > > > build and maintain operational metrics for the Data Lake tables and > > > views. > > > > The main idea is that, if those metrics can be shared across multiple > > > Table > > > > Management Services and/or other external services, then it would > make > > > > sense to have those metrics served by Polaris. > > > > > > > > I believe this feature would nor only add value to Polaris but also > > > further > > > > advance it as central point in the Data Lake. > > > > > > > > The detailed proposal document is here: > > > > > > > > > > > > > > https://docs.google.com/document/d/1yHvLwqNVD3Z84KYcc_m3c4M8bMijTXg9iP1CR0JXxCc > > > > > > > > Please let me know if you have any feedback or comment ! > > > > > > > > Thanks > > > > -- > > > > > > > > Pierre > > > > > > > > > >
