Hi all, I've been thinking about how Polaris should support Iceberg scan and commit metrics. A few challenges have come up in recent discussions: 1. Sync metrics persistence chokes Polaris persistence due to the high volume of scan metrics [3]. 2. We spent considerable time figuring out the metrics persistence, including the schema, SPIs, REST APIs [4]. 3. Metric filtering remains a challenge [1]. 4. We need to figure out how to purge metrics because they keep growing [2].
Looking at these challenges, most of them are not really metrics problems. They are transport, delivery, retention, and lifecycle problems that the existing event framework already addresses. I'd like to propose using the event system to facilitate the current use cases of Iceberg scan and commit metrics rather than introducing a separate Polaris metrics subsystem. The metrics for current use cases are fundamentally events with structured telemetry attached. They are append only, generated by IRC endpoints, typically consumed asynchronously, and often forwarded to external systems. Since Polaris already needs to support them as part of IRC, treating them as event types seems like a natural fit. More importantly, I think Polaris should remain a catalog service and telemetry producer rather than a metrics warehouse. Instead of introducing a dedicated metrics subsystem along with storage, retention, query, and scaling concerns, we could build on the existing event framework: - Emit them through the existing event mechanism. We will do that anyway given it's an IRC endpoint. - Let custom event listeners route them to the destination of choice, such as Prometheus, Grafana, RDBMSs, or other systems. - Reuse the existing event lifecycle, retention, and delivery models. If temporary persistence is still required, the existing event table can serve that purpose. The payload size is manageable given that we have put the loadTable/LoadView response in events. This approach also gives deployments flexibility to filter, sample, or redirect high volume scan metrics without Polaris needing backend specific metric storage behavior. For example, event listeners can choose which metric events to process. We don't need to implement metric filtering logic [1]. In short, my proposal is: Events provide the transport and lifecycle mechanism, while downstream metrics systems remain responsible for storage, querying, aggregation, and visualization. Curious what others think. 1. https://lists.apache.org/thread/ogskc1szctkg5n0tdj0cm3pfkowcwx4z 2. https://lists.apache.org/thread/5nst0f2ygnl2gj3j910q7m8nk2fvokc7 3. https://lists.apache.org/thread/zp2rvsdkq3mb46722o0hfl0zh7kdqyr8 4. https://lists.apache.org/thread/qj1y7cw4dygcnczmymdwkfkp4ysq41ts Yufei
