Re: [PROPOSAL] Add Data Lake operational metrics to Polaris

Yufei Gu Fri, 26 Sep 2025 17:25:19 -0700

Thanks for changing the permissions. I have put some comments in the design
doc.


> Like, imagine the case where external services want to trigger a metric
recomputation after n table commits.  Then the Polaris Events API would be
the starting point.

Just to confirm, this is not a Polaris event listener but the Iceberg Event
REST endpoint(WIP), right? If we are using the Polaris event listener, we
still have to figure out the protocol between the delegation service
clients and servers, which are described in William's doc.

To be clear, I think using the Iceberg Event REST endpoint is a good idea,
it decouples the external service nicely, but we may have to wait for a
while, as it's still WIP in the Iceberg community.

Other than that, the SPI interface design seems missing in the doc. That's
an essential part of the metrics persistence. I think we will need more
interface details to move forward.

Yufei


On Wed, Sep 24, 2025 at 12:30 AM Pierre Laporte <[email protected]>
wrote:

> Thanks Yufei
>
> My bad!  I just changed the sharing settings to enable comments.
>
> Here are some possible answers for your questions
>
> As I review, I had a few questions/clarifications:
> >
> >    - For trivial metrics derivable from metadata.json, do you envision
> >    external services recalculating them each time, or could Polaris parse
> > and
> >    persist them at commit time? Given Polaris has to deal with the metric
> >    endpoint(/v1/{prefix}/namespaces/{namespace}/tables/{table}/metrics)
> >    anyways, it would be reasonable to let Polaris handle everything
> around
> >    metadata.json as well.
> >
>
> The current proposal is compatible with this.  Polaris could parse/persist
> certain metrics at commit time and serve them like other metrics.
> Especially considering that the new proposal is contained within the
> Polaris runtime, the integration overhead would be minimal.
>
> I kept this out of the proposal on purpose.  I see this proposal as the
> first step of a bigger effort.  Hence the idea of defining a component that
> is versatile enough so that it can be used for more than the initial set of
> metrics.
>
> I think we could keep these trivial metrics out of the current proposal,
> for the sake of having smaller iterations.
>
>
> >    - How do you see the extra metrics services working in practice,
> >    particularly their triggering mechanism? William’s delegation service
> >    proposal(
> >
> >
> https://docs.google.com/document/d/1AhR-cZ6WW6M-z8v53txOfcWvkDXvS-0xcMe3zjLMLj8/edit?tab=t.0#heading=h.57vglsnkoru0
> > )
> >    on asynchronous tasks seems like a good starting point.
> >
>
> I am not sure I fully understand this question.  Could you clarify this?
>
> Like, imagine the case where external services want to trigger a metric
> recomputation after n table commits.  Then the Polaris Events API would be
> the starting point.  The services would subscribe to the events they are
> interested in and manage their custom triggers themselves.
>
> With the current proposal, I am not sure there would be any value in
> Polaris triggering external metric computation.  Or even being aware that
> metric computation is happening.  Does it make sense?
>
>
> >    - Do we store metrics separately or together with the other Polaris
> >    transactional tables? What retention policy do you think makes sense
> for
> >    these metrics?
> >
>
> Polaris service owners can leverage the SPI to use their preferred database
> with their own retention policy.  From the SPI perspective, what matters is
> that the last metrics can be retrieved and that new metrics can be added.
>
> To say it differently: the SPI does not make any assumption on whether
> historical data is retained, or even how this data is stored.
>
>
> Does it help?
>
> --
> Pierre
>

Re: [PROPOSAL] Add Data Lake operational metrics to Polaris

Reply via email to