Hi folks Oleg and I have been working on the proposal to take your feedback into account. The updated doc is available at https://docs.google.com/document/d/1oFsuI_WKY0QqVqBNS4gtLlDdlZiV9fGmmUG3rLSED4Y/edit?tab=t.0#heading=h.1duembdpfkwi
Notable changes: - For the first phase of the proposal, Polaris will collect certain Iceberg metrics provided by query engines via the IRC, leveraging Snapshot Summaries (e.g. total number of files, total number of data/delete files added, ...). - Two SPI implementations are provided, including one to store the latest metric values in Polaris' Postgres database, including a database schema example. - A new SPI method has been added so that Polaris can request metrics deletion when a table is dropped The rest of the proposal has not changed much. It still allows us to add more metric definitions in the registry, as we see fit. It still includes the same API endpoints and RBAC integration. And it is compatible with supporting metrics for non-Iceberg tables. Hopefully, this gives a better view of how operational metrics can be either collected directly by Polaris or pushed by external services (that will be in a subsequent phase to keep this proposal short). Cheers -- Pierre On Sun, Oct 5, 2025 at 5:19 PM Pierre Laporte <[email protected]> wrote: > > On Fri, Oct 3, 2025 at 7:28 PM Eric Maynard <[email protected]> > wrote: > >> > IMHO, we should not add a dependency between this proposal and other >> efforts that are not implemented yet, as it would prevent us from moving >> forward on operational metrics until all the pieces are in place. >> >> This is an interesting argument given that the delegation service proposal >> you mention was/is blocked because of another effort that was not (is >> not?) >> implemented. I still don’t understand how this is materially different and >> thought the delegation service was intended to support these operational >> metrics. >> > > Let's not conflate the current proposal with the other discussions about > how Polaris could execute synchronous/asynchronous tasks. We should > continue the discussions about the Delegation Service and the Async & > Reliable Tasks proposals in their respective threads. And those > discussions should (IMHO) not prevent us from moving forward on the > operational metrics bits we agree on. > > I do think we need to figure out on a high level which direction we’re >> going here rather than just rush forward with the first proposal that >> doesn’t immediately get a -1. >> > > I am not sure I understand this statement. What I am proposing is that we > start implementing the parts that we have consensus on, and we continue > discussing the other parts. I would not call that "rush forward with the > first proposal that doesn’t immediately get a -1". > > Let me repeat my question as I think it is important we decide on an > answer to avoid confusion: As we discussed in this thread and during > previous community calls, the goal of the second proposal is to start > small, and build our way up. It is not about having a perfect design > document before starting implementation. Has this changed? > > I am personally in favor of that incremental approach. That being said, > if instead of going that way, the community would rather have a fully > designed system before any of the implementation work happens, that's fine > by me. We just need to clarify it so that there is no ambiguity. >
