I think a simple metrics API makes a lot of sense. Decoupling this from events makes sense, as this would just be useful to query periodically for a variety of reasons not tied to event triggering.
Mike On Thu, Apr 10, 2025 at 3:00 PM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi Eric > > Thanks a lot for your feedback! > > As a first step, I would not store additional entities to store that, but > more "querying" the existing entities (tables, etc) and the Iceberg meta > (including table properties) to display that. > > I agree about finishing Event Listeners. In the meantime, I would start > with a first version of "Observe API", pretty simple (just entities metrics > like number of tables, views, etc). The idea is to façade persistence to > provide some kind of metrics (the client should not directly access the > persistence layer). A first use case would be UI/CLI, that we can extend > later for "fine-triggered" TMS. > > Regards > JB > > On Thu, Apr 10, 2025 at 11:48 AM Eric Maynard <eric.w.mayn...@gmail.com> > wrote: > > > I think the concept is really useful. The only thing I think which would > > require some more investigation is how exactly we implement this API -- > > where the data is stored, how long it's retained, etc. We might need to > > consider pushing this data out into another service or at least > supporting > > such an implementation. > > > > I'm also glad you called out the idea of a "fine-triggered" TMS based on > > events. A while ago, I had started drafting a design with a similar idea: > > > > [image: Screenshot 2025-04-10 at 11.46.19 AM.png] > > > > The concept was that some service can scrape events from Polaris (or get > > Polaris can push events to it) and that service will persist the events > so > > that TMS, observability service, etc. can query those events. > > > > To this end, I think it might be worth finishing the ongoing Event > > Listeners > > < > https://docs.google.com/document/d/1sJiFKeMlPVlqRUj8Rv4YufMMrfCq_3ZFtDuNv_8eOQ0/edit?tab=t.0#heading=h.8d519gwzsle2> > work, > > so we have a way to collect the kind of information that the observe API > > will report. This gets us canonical event types as well. > > > > On Wed, Apr 9, 2025 at 10:09 PM Jean-Baptiste Onofré <j...@nanthrax.net> > > wrote: > > > >> Hi folks, > >> > >> I would like to discuss a proposal that I have in mind: the "observe" > API. > >> > >> The purpose of this API is to return some metrics and gauges from > >> Polaris, like: > >> - what's number of entities (number of tables, views; etc) in a Polaris > >> catalog > >> - what's the number of times a entity as been accessed on a period > >> - optionaly, access to "polished" metrics from table (extracted++ from > >> the metadata) > >> - optionaly, provide extra details (from Parquet metrics for instance) > >> > >> In terms of use cases, this API could be helpful: > >> - to have a policy "activated" depending of this metrics (something > >> like policy A is only valid if a catalog has more than X tables, or > >> policy B is activated when a view has been accessed more than Y times > >> in the last hour, etc). We can have TMS service "fined triggered" with > >> these policies. > >> - to be leverage by a FGAC mechanism (e.g. governance depending of > >> these metrics) > >> - to be easily displayed by a UI or CLI > >> > >> I already have a few ideas in mind that I would be happy to share in a > >> design document. But before that, I would like to get your feedback > >> about this proposal. > >> > >> Thanks ! > >> Regards > >> JB > >> > > >