Hi Yufei and Dmitri,

Here is a proposal for the REST endpoints for metrics and events.

https://github.com/apache/polaris/pull/3924/changes

I did not see any precursors for raising a PR for proposals, so trying this.  
Please let me know what you think.

-
Anand

From: Anand Kumar Sankaran <[email protected]>
Date: Monday, March 2, 2026 at 10:25 AM
To: [email protected] <[email protected]>
Subject: Re: Polaris Telemetry and Audit Trail

About the REST API, based on my use cases:


  1.
I want to be able to query commit metrics to track files added / removed per 
commit, along with record counts. The ingestion pipeline that writes this data 
is owned by us and we are guaranteed to write this information for each write.
  2.
I want to be able to query scan metrics for read. I understand clients do not 
fulfill this requirement.
  3.
I want to be able to query the events table (events are persisted) - this may 
supersede #2, I am not sure yet.

All this information is in the JDBC based persistence model and is persisted in 
the metastore. I currently don’t have a need to query prometheus or open 
telemetry. I do publish some events to Prometheus and they are forwarded to our 
dashboards elsewhere.

About the CLI utilities, I meant the admin user utilities. In one of the 
earliest drafts of my proposal, Prashant mentioned that the metrics tables can 
grow indefinitely and that a similar problem exists with the events table as 
well. We discussed that cleaning up of old records from both metrics tables and 
events tables can be done via a CLI utility.

I see that Yufei has covered the discussion about datasources.

-
Anand



From: Yufei Gu <[email protected]>
Date: Friday, February 27, 2026 at 9:54 PM
To: [email protected] <[email protected]>
Subject: Re: Polaris Telemetry and Audit Trail

This Message Is From an External Sender
This message came from outside your organization.
Report 
Suspicious<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/Iz9xO38YGHZK!YhNDZABkHi1B699ote2uMwpOZw8i0QMCGO2Szc-HshuABGhGvwPJcymE6G2oUUxtS8xDkSrtGTPm_I3QnVDHoLMk50m9v8z_nZKTkd-bnVUbreF1u0WnfV_X5eYevZl_$>


As I mentioned in 
https://urldefense.com/v3/__https://github.com/apache/polaris/issues/3890__;!!Iz9xO38YGHZK!5EuyFFkk3vhRWVIRvQAWBSQfpJkTMA9HxugzDwXmN0LPPqhEFxYkFRGVhtb8AqUwXtDh2OplcMnbMDHKOxrvDU0$,
 supporting
multiple data sources is not a trivial change. I would strongly recommend
starting with a design document to carefully evaluate the architectural
implications and long term impact.

A REST endpoint to query metrics seems reasonable given the current JDBC
based persistence model. That said, we may also consider alternative
storage models. For example, if we later adopt a time series system such as
Prometheus to store metrics, the query model and access patterns would be
fundamentally different. Designing the REST API without considering these
potential evolutions may limit flexibility. I'd suggest to start with the
use case.

Yufei


On Fri, Feb 27, 2026 at 3:42 PM Dmitri Bourlatchkov <[email protected]>
wrote:

> Hi Anand,
>
> Sharing my view... subject to discussion:
>
> 1. Adding non-IRC REST API to Polaris is perfectly fine.
>
> Figuring out specific endpoint URIs and payloads might require a few
> roundtrips, so opening a separate thread for that might be best.
> Contributors commonly create Google Docs for new API proposals too (they
> fairly easy to update as the email discussion progresses).
>
> There was a suggestion to try Markdown (with PRs) for proposals [1] ...
> feel free to give it a try if you are comfortable with that.
>
> 2. Could you clarify whether you mean end user utilities or admin user
> utilities? In the latter case those might be more suitable for the Admin
> CLI (java) not the Python CLI, IMHO.
>
> Why would these utilities be common with events? IMHO, event use cases are
> distinct from scan/commit metrics.
>
> 3. I'd prefer separating metrics persistence from MetaStore persistence at
> the code level, so that they could be mixed and matched independently. The
> separate datasource question will become a non-issue with that approach, I
> guess.
>
> The rationale for separating scan metrics and metastore persistence is that
> "cascading deletes" between them are hardly ever required. Furthermore, the
> data and query patterns are very different so different technologies might
> be beneficial in each case.
>
> [1] 
> https://urldefense.com/v3/__https://lists.apache.org/thread/yto2wp982t43h1mqjwnslswhws5z47cy__;!!Iz9xO38YGHZK!5EuyFFkk3vhRWVIRvQAWBSQfpJkTMA9HxugzDwXmN0LPPqhEFxYkFRGVhtb8AqUwXtDh2OplcMnbMDHKxYDakNU$
>
> Cheers,
> Dmitri.
>
> On Fri, Feb 27, 2026 at 6:19 PM Anand Kumar Sankaran via dev <
> [email protected]> wrote:
>
> > Thanks all. This PR is merged now.
> >
> > Here are the follow-up features / work needed.  These were all part of
> the
> > merged PR at some point in time and were removed to reduce scope.
> >
> > Please let me know what you think.
> >
> >
> >   1.  A REST API to paginate through table metrics. This will be non-IRC
> > standard addition.
> >   2.  Utilities for managing old records, should be common with events.
> > There was some discussion that it belongs to the CLI.
> >   3.  Separate datasource (metrics, events, even other tables?).
> >
> >
> > Anything else?
> >
> > -
> > Anand
> >
> >
>

Reply via email to