Re: [DISCUSS] Add Polaris Observe API

Michael Collado Thu, 10 Apr 2025 16:55:21 -0700

I think a simple metrics API makes a lot of sense. Decoupling this from
events makes sense, as this would just be useful to query periodically for
a variety of reasons not tied to event triggering.


Mike

On Thu, Apr 10, 2025 at 3:00 PM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi Eric
>
> Thanks a lot for your feedback!
>
> As a first step, I would not store additional entities to store that, but
> more "querying" the existing entities (tables, etc) and the Iceberg meta
> (including table properties) to display that.
>
> I agree about finishing Event Listeners. In the meantime, I would start
> with a first version of "Observe API", pretty simple (just entities metrics
> like number of tables, views, etc). The idea is to façade persistence to
> provide some kind of metrics (the client should not directly access the
> persistence layer). A first use case would be UI/CLI, that we can extend
> later for "fine-triggered" TMS.
>
> Regards
> JB
>
> On Thu, Apr 10, 2025 at 11:48 AM Eric Maynard <eric.w.mayn...@gmail.com>
> wrote:
>
> > I think the concept is really useful. The only thing I think which would
> > require some more investigation is how exactly we implement this API --
> > where the data is stored, how long it's retained, etc. We might need to
> > consider pushing this data out into another service or at least
> supporting
> > such an implementation.
> >
> > I'm also glad you called out the idea of a "fine-triggered" TMS based on
> > events. A while ago, I had started drafting a design with a similar idea:
> >
> > [image: Screenshot 2025-04-10 at 11.46.19 AM.png]
> >
> > The concept was that some service can scrape events from Polaris (or get
> > Polaris can push events to it) and that service will persist the events
> so
> > that TMS, observability service, etc. can query those events.
> >
> > To this end, I think it might be worth finishing the ongoing Event
> > Listeners
> > <
> https://docs.google.com/document/d/1sJiFKeMlPVlqRUj8Rv4YufMMrfCq_3ZFtDuNv_8eOQ0/edit?tab=t.0#heading=h.8d519gwzsle2>
> work,
> > so we have a way to collect the kind of information that the observe API
> > will report. This gets us canonical event types as well.
> >
> > On Wed, Apr 9, 2025 at 10:09 PM Jean-Baptiste Onofré <j...@nanthrax.net>
> > wrote:
> >
> >> Hi folks,
> >>
> >> I would like to discuss a proposal that I have in mind: the "observe"
> API.
> >>
> >> The purpose of this API is to return some metrics and gauges from
> >> Polaris, like:
> >> - what's number of entities (number of tables, views; etc) in a Polaris
> >> catalog
> >> - what's the number of times a entity as been accessed on a period
> >> - optionaly, access to "polished" metrics from table (extracted++ from
> >> the metadata)
> >> - optionaly, provide extra details (from Parquet metrics for instance)
> >>
> >> In terms of use cases, this API could be helpful:
> >> - to have a policy "activated" depending of this metrics (something
> >> like policy A is only valid if a catalog has more than X tables, or
> >> policy B is activated when a view has been accessed more than Y times
> >> in the last hour, etc). We can have TMS service "fined triggered" with
> >> these policies.
> >> - to be leverage by a FGAC mechanism (e.g. governance depending of
> >> these metrics)
> >> - to be easily displayed by a UI or CLI
> >>
> >> I already have a few ideas in mind that I would be happy to share in a
> >> design document. But before that, I would like to get your feedback
> >> about this proposal.
> >>
> >> Thanks !
> >> Regards
> >> JB
> >>
> >
>

Re: [DISCUSS] Add Polaris Observe API

Reply via email to