Re: Provide a mechanism to purge the events/metrics table

Alexandre Dutra Thu, 11 Jun 2026 06:31:54 -0700

Hi all,

I like the idea of a maintenance section in the Helm chart that would
create Jobs or CronJobs delegating to various admin commands. This
design looks clean to me, and corresponds to how the admin tool was
designed to be used.


Thanks,
Alex

On Thu, Jun 11, 2026 at 1:17 AM Yong Zheng <[email protected]> wrote:
>
> Yes for the helm maintenance section which will create k8s cronjob. For 
> non-k8s env, you will just need to invoke the CLI periodically with ur job 
> orchestrator.
>
> Thanks,
> Yong Zheng
>
> > On Jun 10, 2026, at 4:59 PM, Dmitri Bourlatchkov <[email protected]> wrote:
> >
> > Hi Nandor,
> >
> > I was thinking about a k8s cron job too for OSS charts.
> >
> > In non-k8s environments, users will have to find a way to call the new
> > admin tool command.
> >
> > Cheers,
> > Dmitri.
> >
> >> On Wed, Jun 10, 2026 at 3:55 PM Nándor Kollár <[email protected]> wrote:
> >>
> >> +1 for the Helm chart maintenance section too. Would that create a k8s
> >> cron job, which periodically executes the cleanup admin command?
> >> Customers, who don't use Kubernetes should solve the scheduling in
> >> their own system, for example configuring a cron job on a VM?
> >>
> >> Dmitri Bourlatchkov <[email protected]> ezt írta (időpont: 2026. jún.
> >> 9., K, 5:34):
> >>>
> >>> Hi Yong,
> >>>
> >>> +1 to adding a maintenance section to the helm chart.
> >>>
> >>> Cheers,
> >>> Dmitri.
> >>>
> >>> On Mon, Jun 8, 2026 at 10:13 PM Yong Zheng <[email protected]>
> >> wrote:
> >>>
> >>>> Hello Nándor and Dmitri,
> >>>>
> >>>> I agree this is becoming more important as we persist more data in the
> >>>> Polaris backend. Today we have at least the events tables and the
> >> persisted
> >>>> Iceberg metrics tables that need some form of cleanup and retention
> >>>> management.
> >>>>
> >>>> The admin tool approach sounds reasonable to me. It gives operators
> >> control
> >>>> over when cleanup runs and allows them to use existing scheduling
> >>>> mechanisms such as k8s crob.
> >>>>
> >>>> It would also be nice to avoid building a separate cleanup solution for
> >>>> every feature. If we go down the admin tool route, perhaps we can have
> >> a
> >>>> common maintenance framework that supports events cleanup, metrics
> >> cleanup,
> >>>> engine-specific maintenance tasks (for example, rebuilding indexes), as
> >>>> well as future maintenance operations.
> >>>>
> >>>> I am pretty open-ended on the implementation details. One thing that I
> >>>> think would be beneficial is introducing a maintenance section in the
> >>>> Polaris helm chart. That would allow operators to configure and
> >> schedule
> >>>> maintenance tasks without having to create separate one-off charts or
> >> jobs
> >>>> for each task.
> >>>>
> >>>> Thanks,
> >>>> Yong Zheng
> >>>>
> >>>>
> >>>> On Mon, Jun 8, 2026 at 8:01 PM Dmitri Bourlatchkov <[email protected]>
> >>>> wrote:
> >>>>
> >>>>> Hi Yong,
> >>>>>
> >>>>> Thanks for starting this discussion!
> >>>>>
> >>>>> From my POV the Admin tool does look like a good fit for this
> >> capability.
> >>>>> It is similar to the NoSQL maintenance task [3395].
> >>>>>
> >>>>> I believe end users could then schedule the maintenance runs
> >> according to
> >>>>> their deployment mechanics, e.g. via k8s jobs.
> >>>>>
> >>>>> I made an attempt at refactoring the Admin CLI for pluggability in
> >> terms
> >>>> of
> >>>>> sub-commands in [3947]. We could revive that PR if there's community
> >>>>> interest. The Metrics / Events maintenance tasks could then be
> >> plugged in
> >>>>> similarly to NoSQL maintenance.
> >>>>>
> >>>>> [3395] https://github.com/apache/polaris/pull/3395
> >>>>>
> >>>>> [3947] https://github.com/apache/polaris/pull/3947
> >>>>>
> >>>>> Cheers,
> >>>>> Dmitri.
> >>>>>
> >>>>> On Sun, Jun 7, 2026 at 2:34 PM Yong Zheng <[email protected]> wrote:
> >>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> A while back Alex raised
> >> https://github.com/apache/polaris/issues/2573
> >>>>>> for requesting a mechanism to purge the events table. Recently
> >> there
> >>>> is a
> >>>>>> persisted iceberg metrics also got introduced (
> >>>>>> https://github.com/apache/polaris/pull/3385) and this created two
> >>>> tables
> >>>>>> (read and write metrics tables) which we also lack the life cycle
> >>>>>> management and tables size should grow indefinitely. We will likely
> >>>> need
> >>>>> a
> >>>>>> mechanism to handle both.
> >>>>>>
> >>>>>> I am wondering what does community thinks about this? Should this
> >> be
> >>>> part
> >>>>>> of admin tool where admins/ops should make the call on when to
> >> clean up
> >>>>> or
> >>>>>> should we have a janitor process that runs automatically (users
> >> will
> >>>> need
> >>>>>> to provide rules on what to cleanup such as time based TTL).
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Yong Zheng
> >>>>>>
> >>>>>
> >>>>
> >>

Re: Provide a mechanism to purge the events/metrics table

Reply via email to