Re: [PROPOSAL] Add Data Lake operational metrics to Polaris

Pierre Laporte Sat, 18 Oct 2025 02:56:21 -0700

Thanks Yufei

My bad!  I just changed the sharing settings to enable comments.


Here are some possible answers for your questions

As I review, I had a few questions/clarifications:
>
>    - For trivial metrics derivable from metadata.json, do you envision
>    external services recalculating them each time, or could Polaris parse
> and
>    persist them at commit time? Given Polaris has to deal with the metric
>    endpoint(/v1/{prefix}/namespaces/{namespace}/tables/{table}/metrics)
>    anyways, it would be reasonable to let Polaris handle everything around
>    metadata.json as well.
>

The current proposal is compatible with this.  Polaris could parse/persist
certain metrics at commit time and serve them like other metrics.
Especially considering that the new proposal is contained within the
Polaris runtime, the integration overhead would be minimal.

I kept this out of the proposal on purpose.  I see this proposal as the
first step of a bigger effort.  Hence the idea of defining a component that
is versatile enough so that it can be used for more than the initial set of
metrics.

I think we could keep these trivial metrics out of the current proposal,
for the sake of having smaller iterations.


>    - How do you see the extra metrics services working in practice,
>    particularly their triggering mechanism? William’s delegation service
>    proposal(
>
> https://docs.google.com/document/d/1AhR-cZ6WW6M-z8v53txOfcWvkDXvS-0xcMe3zjLMLj8/edit?tab=t.0#heading=h.57vglsnkoru0
> )
>    on asynchronous tasks seems like a good starting point.
>

I am not sure I fully understand this question.  Could you clarify this?

Like, imagine the case where external services want to trigger a metric
recomputation after n table commits.  Then the Polaris Events API would be
the starting point.  The services would subscribe to the events they are
interested in and manage their custom triggers themselves.

With the current proposal, I am not sure there would be any value in
Polaris triggering external metric computation.  Or even being aware that
metric computation is happening.  Does it make sense?


>    - Do we store metrics separately or together with the other Polaris
>    transactional tables? What retention policy do you think makes sense for
>    these metrics?
>

Polaris service owners can leverage the SPI to use their preferred database
with their own retention policy.  From the SPI perspective, what matters is
that the last metrics can be retrieved and that new metrics can be added.

To say it differently: the SPI does not make any assumption on whether
historical data is retained, or even how this data is stored.


Does it help?

--
Pierre

Re: [PROPOSAL] Add Data Lake operational metrics to Polaris

Reply via email to