> IMHO, we should not add a dependency between this proposal and other efforts that are not implemented yet, as it would prevent us from moving forward on operational metrics until all the pieces are in place.
This is an interesting argument given that the delegation service proposal you mention was/is blocked because of another effort that was not (is not?) implemented. I still don’t understand how this is materially different and thought the delegation service was intended to support these operational metrics. I do think we need to figure out on a high level which direction we’re going here rather than just rush forward with the first proposal that doesn’t immediately get a -1. On Fri, Oct 3, 2025 at 2:01 AM Pierre Laporte <[email protected]> wrote: > Thanks for your feedback, folks > > @Yufei there seems to be a misunderstanding. As we discussed in this > thread and during previous community calls, the goal of the second proposal > is to start small, and build our way up. It is not about having a perfect > design document before starting implementation. Has this changed? > > The previous proposal included the possibility to store metrics in the > Metastore. And there were some concerns about whether we should store just > the latest metric values, or keep track of historical data, should we > choose a database, define retention policies, etc... All very good > points! And I think those points that deserve separate discussion. > > So the current proposal abstracts that behind an SPI. In other words, the > current proposal defines the necessary parts that will allow us to plug a > database and data model in a second phase. The goal is to build consensus, > move forward with the bits we already agree on, and continue iterating > while the implementation is in progress. > > This proposal seems to address both. For category (2), we’ll need a clear > > design for how it integrates with an external service. That should cover > > aspects such as workload life cycle management (triggering, state > control, > > etc.). > > > This has to be a separate design document. The first proposal mentioned an > integration with the Async Tasks framework, eventually. AFAICT, the > Delegation Service proposal is not merged either. > > IMHO, we should not add a dependency between this proposal and other > efforts that are not implemented yet, as it would prevent us from moving > forward on operational metrics until all the pieces are in place. > > > > That said, I think it would be reasonable to narrow the initial > > scope to category (1). Could we clarify that in the proposal? > > > > This is not what the current document says, though. The updated proposal > defines the backbone that enables Polaris to store and serve metrics. > > > > On persistence, I believe the most critical part of the design is the > > schema. Once the schema is defined, the SPI details could be derived > > relatively easily. One important factor we shouldn’t overlook is the type > > of database we want to leverage: a time series database (TSDB) or a > > general-purpose OLTP database. This choice will heavily influence schema > > design. For instance, a TSDB schema must include a timestamp, metric > name, > > and dimensions, while an OLTP schema is more generic and flexible. The > > choice also affects the SPI design. For example, aggregations, rollups, > and > > sliding windows are first-class operations in TSDB, while joins are > > supported better in OLTP. Could we add this consideration to the design > > document? > > > > I want to ask, is this really a blocker for the current proposal? Or can > the current proposal be implemented while we iron out the metrics > persistence details? > > I have to ask because you raise very good questions: > * Should we decide now whether the metrics database should be a TSDB or an > OLTP database? > * Should Polaris bundle it in its distribution? > * Or should it include a connector to said database, in which case users > have to provide Polaris with connection parameters? > * The answer to ^ determines whether the retention policy is Polaris' > responsibility > > The SPI should not be designed after a single database. It should be > designed to support the operational metrics service features. And it > should be abstract enough that it can be extended later to use different > databases. > > - Polaris controls which metrics are calculated and when (benefitting from > > event listeners). > > - Polaris delegates computation to external engines if needed (SPI or > > API?). > > > @Oleg that is also a very interesting point. To me, those two points would > make Polaris reimplement a message queue, with external engines requesting > Polaris to trigger metric computation after a certain threshold has > happened (e.g. every x commits on a table or after n minutes at most, > ...). And I wonder whether Polaris' Events could be instead forwarded to > an external message queue that supports well this aggregate/dispatch work. > > But as you can guess, it is going to be quite a discussion. Because > depending on how this discussion goes, Polaris Events interface should be > updated to support external message queues, or Polaris should define a > triggering system that enables external systems to define their own > triggers. > > > > My main point is, those topics are independent of the metrics REST API > definition, the RBAC integration, ... parts that we have consensus on. > I.e. start small and iterate. > > Wdyt? >
