+1 for the Helm chart maintenance section too. Would that create a k8s cron job, which periodically executes the cleanup admin command? Customers, who don't use Kubernetes should solve the scheduling in their own system, for example configuring a cron job on a VM?
Dmitri Bourlatchkov <[email protected]> ezt írta (időpont: 2026. jún. 9., K, 5:34): > > Hi Yong, > > +1 to adding a maintenance section to the helm chart. > > Cheers, > Dmitri. > > On Mon, Jun 8, 2026 at 10:13 PM Yong Zheng <[email protected]> wrote: > > > Hello Nándor and Dmitri, > > > > I agree this is becoming more important as we persist more data in the > > Polaris backend. Today we have at least the events tables and the persisted > > Iceberg metrics tables that need some form of cleanup and retention > > management. > > > > The admin tool approach sounds reasonable to me. It gives operators control > > over when cleanup runs and allows them to use existing scheduling > > mechanisms such as k8s crob. > > > > It would also be nice to avoid building a separate cleanup solution for > > every feature. If we go down the admin tool route, perhaps we can have a > > common maintenance framework that supports events cleanup, metrics cleanup, > > engine-specific maintenance tasks (for example, rebuilding indexes), as > > well as future maintenance operations. > > > > I am pretty open-ended on the implementation details. One thing that I > > think would be beneficial is introducing a maintenance section in the > > Polaris helm chart. That would allow operators to configure and schedule > > maintenance tasks without having to create separate one-off charts or jobs > > for each task. > > > > Thanks, > > Yong Zheng > > > > > > On Mon, Jun 8, 2026 at 8:01 PM Dmitri Bourlatchkov <[email protected]> > > wrote: > > > > > Hi Yong, > > > > > > Thanks for starting this discussion! > > > > > > From my POV the Admin tool does look like a good fit for this capability. > > > It is similar to the NoSQL maintenance task [3395]. > > > > > > I believe end users could then schedule the maintenance runs according to > > > their deployment mechanics, e.g. via k8s jobs. > > > > > > I made an attempt at refactoring the Admin CLI for pluggability in terms > > of > > > sub-commands in [3947]. We could revive that PR if there's community > > > interest. The Metrics / Events maintenance tasks could then be plugged in > > > similarly to NoSQL maintenance. > > > > > > [3395] https://github.com/apache/polaris/pull/3395 > > > > > > [3947] https://github.com/apache/polaris/pull/3947 > > > > > > Cheers, > > > Dmitri. > > > > > > On Sun, Jun 7, 2026 at 2:34 PM Yong Zheng <[email protected]> wrote: > > > > > > > Hello, > > > > > > > > A while back Alex raised https://github.com/apache/polaris/issues/2573 > > > > for requesting a mechanism to purge the events table. Recently there > > is a > > > > persisted iceberg metrics also got introduced ( > > > > https://github.com/apache/polaris/pull/3385) and this created two > > tables > > > > (read and write metrics tables) which we also lack the life cycle > > > > management and tables size should grow indefinitely. We will likely > > need > > > a > > > > mechanism to handle both. > > > > > > > > I am wondering what does community thinks about this? Should this be > > part > > > > of admin tool where admins/ops should make the call on when to clean up > > > or > > > > should we have a janitor process that runs automatically (users will > > need > > > > to provide rules on what to cleanup such as time based TTL). > > > > > > > > Thanks, > > > > Yong Zheng > > > > > > > > >
