Hi Yufei, My general understanding from your PR [3340] was that unifying the Admin tool and the Server in one binary runtime env. was preferred.
In any case, I'm open to any suggestions on how we arrange CLI commands and jar files. What is time-sensitive in this discussion, from my POV, is deciding on whether we can accept the NoSQL maintenance command into the Admin Tool now so that the 1.4.0 release could be "well rounded" in terms on NoSQL functionality - that is providing both the Persistence impl. and related maintenance comments to end users in the same distribution. As discussed in this thread, in the current state of the codebase the Admin tool appears to be a natural fit for the NoSQL maintenance command. I propose we start with that for 1.4.0 and work on improvements / adjustments later on "main". WDYT? [3340] https://github.com/apache/polaris/pull/3340 Thanks, Dmitri. On Fri, Feb 13, 2026 at 12:35 AM Yufei Gu <[email protected]> wrote: > > Anand's work on Scan Metrics Persistence [1] is likely going to require a > Metrics maintenance command, which is going to apply to JDBC. > > The scan metrics maintenance command will be likely needed across most > persistence impls, which fits into the same categories as the bootstrap and > purge commands. > > Given that some commands are persistence-specific, does it make sense to > use the shell script as a dispatcher that selects the appropriate jar and > main class based on the configured persistence backend? This could keep the > CLI surface consistent while allowing backend specific implementations to > remain modular. > > Yufei > > > On Mon, Feb 9, 2026 at 6:38 AM Dmitri Bourlatchkov <[email protected]> > wrote: > > > Hi Yufei, > > > > > - Allow operator facing, backend specific maintenance CLIs or commands > to > > be assembled separately. > > > > Hi JB, > > > > (replying to both at the same time :) ) > > > > > - improve CLI/admin tool to work with any persistence backend > > > > This is already the case, AFAIK. The Admin Tool takes the same config as > > the server with respect to Persistence (DataSource, etc.) and will work > > with any Persistence supported by the server. > > > > However, some commands are applicable only to some persistence types. For > > example: > > > > * The NoSQL maintenance command applies to NoSQL Persistence > > > > * Anand's work on Scan Metrics Persistence [1] is likely going to > require a > > Metrics maintenance command, which is going to apply to JDBC. > > > > Moreover, Metrics Persistence over JDBC is probably going to be able to > > work alongside NoSQL MetaStore (Entity Persistence), so all command > > combinations seem to be possible at this point. I would not want us to > rule > > out specific options now, while these features are still young and > subject > > to change. > > > > I believe it is preferable to offer usable tools to end users now (in > > 1.4.0), even though the commands may look confusing at first glance to > some > > of them, over offering a distribution that is perfectly clean, but > missing > > support for some valid use cases. Documentation can help here. > > > > I can see that sorting out Admin CLI concerns is beneficial in the long > > run, still it is likely going to take a considerable effort, while the > > NoSQL Maintenance feature is arealy usable. > > > > If we focus on that, are there strong disadvantages to adding NoSQL > > maintenance to the Admin Tool now and continuing improvement / > > clarification discussions on `main`? > > > > If we can align on this plan, I will start a new email thread about the > > Admin Tool vision and use cases (similar to what I sent about NoSQL > > Maintenance) to kick that discussion off. > > > > [1] https://lists.apache.org/thread/c83jnkvlwc2k3swm65cmvl4t0mt7p799 > > > > Thanks, > > Dmitri. > > > > On Mon, Feb 9, 2026 at 1:40 AM Jean-Baptiste Onofré <[email protected]> > > wrote: > > > > > Hi > > > > > > For me, polaris-tools repo hosts all "client" modules (MCP, Console, > > etc). > > > At some points, Spark client could be in polaris-tools as well. > > > > > > As NoSQL is "part of the server", I don't think it makes sense to move > it > > > to polaris-tools. > > > I do agree that maintenance could be seen as a bit of a client at some > > > point. > > > > > > Short term, I propose to host everything (core/API + maintenant) in the > > > NoSQL module. > > > The "admin tool" and everything use facing should be persistence > agnostic > > > imho. It's not the same persona between the Polaris "deployer" and the > > > Polaris "user". > > > > > > So, in other to move forward, I propose: > > > - to keep everything (core + maintenance) in the NoSQL module > > > - improve CLI/admin tool to work with any persistence backend > > > > > > Regards > > > JB > > > > > > On Fri, Feb 6, 2026 at 2:13 AM Dmitri Bourlatchkov <[email protected]> > > > wrote: > > > > > > > Hi Yufei, > > > > > > > > Thanks for sharing your perspective. > > > > > > > > Cross-module coupling is a valid concern, I agree. > > > > > > > > However, moving NoSQL maintenance code to polaris-tools will > complicate > > > the > > > > distribution, I think. As I mentioned in the Community Sync call > today, > > > > Polaris users need to be able to control the size of the database and > > > > remove dangling data if they are to use NoSQL persistence > effectively. > > > > Therefore, I believe a Polaris release with NoSQL Persistence needs > to > > > > provide the corresponding maintenance tools. Now, if the tools were > in > > > the > > > > polaris-tools repository, they would have to be included into the > > service > > > > binary distribution, which is exactly the complication I mean. > > > > > > > > That said, I believe it should be possible to leverage CDI in the > Admin > > > > Tool the same way we leverage CDI in the Server to allow custom > plugins > > > and > > > > extension points. It will certainly take some follow-up work, but I > > hope > > > we > > > > should be able to have "extended" admin commands in isolated source > > > > sub-modules and only assemble all dependencies at the tool build > time. > > > This > > > > should alleviate coupling concerns, I hope. > > > > > > > > The same should be possible even if we unify the Admin Tool and the > > > Server > > > > Quarkus applications as you proposed in [3340]. I still think that PR > > > > deserves a refresh and a push forward. > > > > > > > > All in all, as far as I can see from GH comments and this discussion, > > the > > > > majority is leaning towards adding NoSQL maintenance commands to the > > > Admin > > > > Tool. If you do not feel too strongly about this, perhaps we could > > merge > > > > [3395] to achieve a coherent user story for 1.4.0 and consider > > different > > > > approaches later. WDYT? > > > > > > > > [3340] https://github.com/apache/polaris/pull/3340 > > > > > > > > [3395] https://github.com/apache/polaris/pull/3395 > > > > > > > > Thanks, > > > > Dmitri. > > > > > > > > On Wed, Feb 4, 2026 at 9:12 PM Yufei Gu <[email protected]> > wrote: > > > > > > > > > Sorry for the late reply, and thanks Dmitri for the detailed > > rationale > > > > and > > > > > everyone for the feedback so far. > > > > > I agree that NoSQL maintenance is necessary, and I like the > direction > > > of > > > > > making it explicit and well namespaced from a UX standpoint. > > > > > > > > > > The concerns I still have are the release/distribution and > > cross-module > > > > > coupling. Even if the core maintenance APIs better stay close to > the > > > > NoSQL > > > > > persistence implementation, I think the metastore-specific > > maintenance > > > > > tooling and its CLI surface should live in the polaris-tools repo. > > > > > Specifically, the admin commands and any operator facing binaries > for > > > > > metastore-specific maintenance should be built and shipped from > > > > > polaris-tools, not from the main repo distribution artifacts. > > > > > > > > > > Why I think this helps > > > > > 1. It reduces release management overhead and keeps the main repo > > > focused > > > > > on the common parts of the server, admin and core modules. > > > > > 2. It creates a scalable pattern as more metastore specific > > operational > > > > > tooling appears, without growing a single monolithic admin surface > > over > > > > > time. > > > > > 3. It makes optionality cleaner for downstream users, they can > choose > > > to > > > > > pull the tools package when running NoSQL persistence without > forcing > > > > extra > > > > > dependencies on users running other metastores. > > > > > > > > > > We still keep the necessary shared interfaces(e.g., bootstrap) in > the > > > > main > > > > > repo, so the tooling can track the metastore schema and evolve > > nicely, > > > > but > > > > > the packaging and module boundary become clearer. Taking the > current > > > JDBC > > > > > backend as an example, only runtime dependencies are required for > > JDBC > > > > > impl. from the admin module, which provides a clean separation. I > > think > > > > we > > > > > should preserve that here as well. > > > > > > > > > > Yufei > > > > > > > > > > > > > > > On Fri, Jan 30, 2026 at 4:47 PM Dmitri Bourlatchkov < > > [email protected]> > > > > > wrote: > > > > > > > > > > > Another (hypothetical ATM) maintenance tool could be necessary > for > > > the > > > > > > scan metrics from Anand's proposal [1] > > > > > > > > > > > > [1] > > https://lists.apache.org/thread/c83jnkvlwc2k3swm65cmvl4t0mt7p799 > > > > > > > > > > > > Cheers, > > > > > > Dmitri. > > > > > > > > > > > > On 2026/01/16 20:01:06 Dmitri Bourlatchkov wrote: > > > > > > > Hi Russell, > > > > > > > > > > > > > > Re: "other" maintenance tools, it's a bit off-topic here, but > > I've > > > > been > > > > > > > thinking about separating schema management from "initial data > > > > > > management" > > > > > > > (cf. [3446]). > > > > > > > > > > > > > > The schema evolution tasks are likely different for each > > database, > > > > > while > > > > > > > things like creating the first "root" principal should probably > > be > > > > the > > > > > > same > > > > > > > for all backends. > > > > > > > > > > > > > > ... but it's a different topic really :) > > > > > > > > > > > > > > [3446] https://github.com/apache/polaris/pull/3446 > > > > > > > > > > > > > > Cheers, > > > > > > > Dmitri. > > > > > > > > > > > > > > On Fri, Jan 16, 2026 at 2:50 PM Russell Spitzer < > > > > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > CEL Comments make sense to me, personally I would'nt use them > > in > > > > this > > > > > > > > context but that's just my personal bias :) > > > > > > > > > > > > > > > > I think nesting makes sense, I was kind of wondering if there > > > were > > > > > > > > other maintenance tools planned? > > > > > > > > > > > > > > > > I was also thinking about the > > > > > > > > > > > > > > > > opposite nesting - > > > > > > > > > > > > > > > > nosql maintenance > > > > > > > > or > > > > > > > > nosql purge > > > > > > > > > > > > > > > > Just so folks know immediately whether or not there is > > something > > > > they > > > > > > need > > > > > > > > in the sub command > > > > > > > > > > > > > > > > On Fri, Jan 16, 2026 at 12:33 PM Dmitri Bourlatchkov < > > > > > [email protected] > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Russell, > > > > > > > > > > > > > > > > > > I agree that the plain "maintenance" Admi CLI command name > is > > > too > > > > > > generic > > > > > > > > > in this context. > > > > > > > > > > > > > > > > > > I believe our existing Admin CLI tooling allows for command > > > > > nesting. > > > > > > How > > > > > > > > > about "maintenance nosql purge"? > > > > > > > > > > > > > > > > > > Please see my reply to Dennis about CEL expressions. > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > Dmitri. > > > > > > > > > > > > > > > > > > On Fri, Jan 16, 2026 at 1:05 PM Russell Spitzer < > > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > This is a great discussion, > > > > > > > > > > > > > > > > > > > > I think the maintenance particularlities of NoSQL are > fine > > to > > > > be > > > > > > > > managed > > > > > > > > > by > > > > > > > > > > those > > > > > > > > > > with the expertise in that system. So I'm not > particularly > > > > > worried > > > > > > > > about > > > > > > > > > it > > > > > > > > > > having > > > > > > > > > > some capabilities that are not explicitly present in > other > > > > > storage > > > > > > > > > layers. > > > > > > > > > > (Special DR, > > > > > > > > > > explicit GC, or whatnot) > > > > > > > > > > > > > > > > > > > > I do agree with some of Dennis' concerns about the actual > > CLI > > > > > > > > integration > > > > > > > > > > just > > > > > > > > > > because I would want to make clear what commands are > > > generally > > > > > > > > applicable > > > > > > > > > > and those which apply to a single persistence layer. In > my > > > > mind, > > > > > > > > "noSql" > > > > > > > > > is > > > > > > > > > > a > > > > > > > > > > Database in itself and to me that would mean its > > maintenance > > > is > > > > > not > > > > > > > > > really > > > > > > > > > > a part of generic Polaris administration. Similar to how > we > > > > don't > > > > > > > > expect > > > > > > > > > > Polaris > > > > > > > > > > to have commands that do cleanup for Postgres or whatnot. > > > That > > > > > > said I > > > > > > > > > > understand > > > > > > > > > > it would be much easier from an end user standpoint if > > there > > > > > > weren't > > > > > > > > > > multiple tools > > > > > > > > > > so I'm not opposed to including it with appropriate > > > > namespacing. > > > > > > > > Keeping > > > > > > > > > it > > > > > > > > > > as just a > > > > > > > > > > base "maintenance" seems a bit misleading. > > > > > > > > > > > > > > > > > > > > The exposure of CEL expressions to the client does seem > to > > > be a > > > > > > bit of > > > > > > > > a > > > > > > > > > > bigger issue, that does seem to be a lot of power for > what > > > has > > > > a > > > > > > rather > > > > > > > > > > limited > > > > > > > > > > set of valid settings? This is more of a personal coding > > > > opinion, > > > > > > but I > > > > > > > > > > generally want > > > > > > > > > > to limit the range of possible inputs whenever possible > > (and > > > > even > > > > > > > > remove > > > > > > > > > > options entirely > > > > > > > > > > if it doesn't make sense for an end user to change them.) > > It > > > > > looks > > > > > > like > > > > > > > > > at > > > > > > > > > > the moment the only option is > > > > > > > > > > "run"? > > > > > > > > > > > > > > > > > > > > On Fri, Jan 16, 2026 at 2:12 AM Dennis Huo < > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Thanks Dmitri for kicking off this thread! > > > > > > > > > > > > > > > > > > > > > > I think even just laying out the design considerations > in > > > the > > > > > > form > > > > > > > > of a > > > > > > > > > > Q&A > > > > > > > > > > > like you did here is great as a supplemental design > > > artifact > > > > > for > > > > > > > > > > posterity > > > > > > > > > > > and this helps address the "documentation" questions I > > > > brought > > > > > > up in > > > > > > > > > > > https://github.com/apache/polaris/pull/3268 > > > > > > > > > > > > > > > > > > > > > > Personally I'm okay with having it in the main shared > > admin > > > > > > tool, as > > > > > > > > > long > > > > > > > > > > > as we can do it in a way that avoids "monolithic code" > > > > scaling > > > > > > issues > > > > > > > > > > that > > > > > > > > > > > can come up as the set of backend-specific things > grows. > > I > > > > > guess > > > > > > this > > > > > > > > > is > > > > > > > > > > a > > > > > > > > > > > good opportunity to start establishing the precedent > for > > > how > > > > to > > > > > > > > > > structure: > > > > > > > > > > > > > > > > > > > > > > 1. Hierarchical command syntax? Would it be like "java > > -jar > > > > > > > > admin-tool > > > > > > > > > > > nosql maintenance garbage-collect > > > --cel-expression='ageDays < > > > > > > 30'"? > > > > > > > > Or > > > > > > > > > > > "java -jar admin-tool maintenance nosql > garbage-collect" > > > > > > (maintenance > > > > > > > > > > > before nosql, or is maintenance specific to nosql? > would > > we > > > > > > collect > > > > > > > > > > common > > > > > > > > > > > maintenance commands that are persistence-agnostic into > > the > > > > > base > > > > > > > > > > > maintenance subcommand?) > > > > > > > > > > > 2. Should we have compile-time options that can choose > > > which > > > > > > > > > subfeatures > > > > > > > > > > to > > > > > > > > > > > build in case there are issues with some subfeature > that > > > > aren't > > > > > > > > > > applicable > > > > > > > > > > > to the user? > > > > > > > > > > > 3. Should we lay out the code for easy segregation as > we > > > > scale? > > > > > > We > > > > > > > > may > > > > > > > > > > not > > > > > > > > > > > want one directory that contains a SpannerMaintenance, > an > > > > > > > > > > > AliyunMaintenance, FoundationDbMaintenance, etc all > next > > to > > > > > each > > > > > > > > other > > > > > > > > > > > > > > > > > > > > > > I think layout aspects could probably be addressed in > an > > > > > > incremental > > > > > > > > > way > > > > > > > > > > > though, so at least I don't have any hard stance on > > what's > > > > the > > > > > > right > > > > > > > > > > > answer, as long as we're flexible in willingness to > > change > > > > the > > > > > > syntax > > > > > > > > > to > > > > > > > > > > be > > > > > > > > > > > more nested/organized when we see the need. > > > > > > > > > > > > > > > > > > > > > > For CEL, I do think it's trickier to evolve it to *take > > > away* > > > > > > > > > > > expressiveness in the future if we let the cat out of > the > > > bag > > > > > to > > > > > > > > allow > > > > > > > > > > too > > > > > > > > > > > expressive a language initially, since it pertains to > the > > > > > > *semantics* > > > > > > > > > of > > > > > > > > > > > what people running the NoSQL impl come to depend on, > > > beyond > > > > > just > > > > > > > > > > *syntax* > > > > > > > > > > > (i.e., it's somewhat easier to change the CLI's syntax > to > > > > > > introduce a > > > > > > > > > > > nesting like "admin-tool nosql maintenance > > garbage-collect" > > > > if > > > > > > the > > > > > > > > > > > underlying functionality is the same, but if someone > > > decides > > > > to > > > > > > start > > > > > > > > > > > depending on being able to runtime-specify CEL > > expressions > > > > like > > > > > > > > > > > 'getDayOfWeek(commitTime) == FRIDAY' it's hard to go > back > > > to > > > > a > > > > > > > > simpler > > > > > > > > > > > world where we didn't have to deal with that). > > > > > > > > > > > > > > > > > > > > > > Note, I might be exaggerating my assumption about what > > the > > > > CEL > > > > > > > > > expression > > > > > > > > > > > supports here since I remember offhand the details > about > > > the > > > > > > part of > > > > > > > > > the > > > > > > > > > > > code that consumes it and I couldn't find docs on what > we > > > > > expect > > > > > > the > > > > > > > > > > > structure of the input to the expression to be and what > > > kind > > > > of > > > > > > CEL > > > > > > > > > > > expressions are actually allowed. > > > > > > > > > > > > > > > > > > > > > > Your clarification that Polaris *Servers* won't need > CEL > > on > > > > the > > > > > > > > > classpath > > > > > > > > > > > does help assuage my concerns about having it has a > > > > heavyweight > > > > > > > > > > dependency > > > > > > > > > > > somewhat, but I think it's still prudent to know > whether > > > the > > > > > > intended > > > > > > > > > use > > > > > > > > > > > cases are a substantially more restrictive set of > > > conditions > > > > > > > > (probably > > > > > > > > > > > minimumNumToKeep and maxAge, right?). > > > > > > > > > > > > > > > > > > > > > > If we capture the pros/cons it'll help our future > selves > > > not > > > > > > have to > > > > > > > > > redo > > > > > > > > > > > the work in considering expressiveness vs > > precision/clarity > > > > of > > > > > > > > > interface > > > > > > > > > > if > > > > > > > > > > > someone tries to evolve the interface again in the > > future. > > > I > > > > > > guess an > > > > > > > > > > > argument in favor of CEL is that it's > cumbersome/messier > > > > enough > > > > > > > > trying > > > > > > > > > to > > > > > > > > > > > express a combined numToKeep and maxAge condition in > > terms > > > a > > > > > > multiple > > > > > > > > > > > different config values that interact. > > > > > > > > > > > > > > > > > > > > > > On Thu, Jan 15, 2026 at 3:15 PM Dmitri Bourlatchkov < > > > > > > > > [email protected]> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi Prashant, > > > > > > > > > > > > > > > > > > > > > > > > You bring up valid points. Apologies for not > explaining > > > > them > > > > > > > > > upfront. I > > > > > > > > > > > > suppose it's human nature to take things for granted > > when > > > > > > you've > > > > > > > > been > > > > > > > > > > > > working with them for a while :) > > > > > > > > > > > > > > > > > > > > > > > > The need to retain more than just the latest state of > > the > > > > > > catalog > > > > > > > > is > > > > > > > > > > > > primarily rooted in Disaster Recovery scenarios > > > > (specifically > > > > > > with > > > > > > > > > > NoSQL > > > > > > > > > > > > persistence). > > > > > > > > > > > > > > > > > > > > > > > > In short, a DR situation may leave the latest state > > > > unusable > > > > > > (e.g. > > > > > > > > > due > > > > > > > > > > to > > > > > > > > > > > > replication lag... exact failures are kind of complex > > and > > > > > > probably > > > > > > > > > > > require > > > > > > > > > > > > a separate discussion), so the admin user may have to > > > reset > > > > > the > > > > > > > > > catalog > > > > > > > > > > > to > > > > > > > > > > > > a previous state. This would be a data loss > situation, > > of > > > > > > course, > > > > > > > > but > > > > > > > > > > it > > > > > > > > > > > > may be the best option to recover some data compared > to > > > > total > > > > > > loss. > > > > > > > > > > > > > > > > > > > > > > > > This is not actualized as specific user-level tools > in > > > OSS > > > > > yet. > > > > > > > > Full > > > > > > > > > DR > > > > > > > > > > > > support requires considerable follow-up work. > > > > > > > > > > > > > > > > > > > > > > > > Whether the flexibility provided by CEL is really > > > required > > > > > for > > > > > > end > > > > > > > > > > users > > > > > > > > > > > > can probably be debated. Let me think more about > that. > > > > > > > > > > > > > > > > > > > > > > > > Re: CEL java maintainability, the Nessie CEL > > implements a > > > > > > > > particular > > > > > > > > > > > > version of the CEL spec and passes Google's > conformance > > > > > tests. > > > > > > > > > > Therefore, > > > > > > > > > > > > it is a correct CEL impl. Whether it needs to adopt > > newer > > > > > spec > > > > > > > > > > revisions > > > > > > > > > > > is > > > > > > > > > > > > not really a maintenance burden in Polaris unless we > > want > > > > to > > > > > > always > > > > > > > > > use > > > > > > > > > > > the > > > > > > > > > > > > latest CEL spec, which IMHO is not a requirement as > the > > > > > > supported > > > > > > > > > > version > > > > > > > > > > > > is already pretty expressive. Please consider that > CEL > > is > > > > > > engaged > > > > > > > > > only > > > > > > > > > > > when > > > > > > > > > > > > the user performs NoSQL maintenance, otherwise it is > > > just a > > > > > jar > > > > > > > > > inside > > > > > > > > > > > the > > > > > > > > > > > > Admin Tool. Polaris Servers should not need CEL on > the > > > > class > > > > > > path, > > > > > > > > > > AFAIK. > > > > > > > > > > > > > > > > > > > > > > > > Re: sync vs. async maintenance, sync cannot be > reliable > > > if > > > > > you > > > > > > > > assume > > > > > > > > > > > that > > > > > > > > > > > > any node can be killed at any time (which is the > > reality > > > in > > > > > > k8s). > > > > > > > > > > > > > > > > > > > > > > > > Re: exposing NoSQL-specific commands in the Admin > > Tool, I > > > > > > > > personally > > > > > > > > > > > think > > > > > > > > > > > > it is similar to supporting different storage > > > technologies > > > > in > > > > > > the > > > > > > > > > > Catalog > > > > > > > > > > > > config (e.g. GCS vs. S3). Polaris CLI has a multitude > > of > > > > > > options > > > > > > > > for > > > > > > > > > > the > > > > > > > > > > > > union of them, but not all features of one storage > type > > > are > > > > > > > > > applicable > > > > > > > > > > to > > > > > > > > > > > > others. > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Dmitri. > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jan 15, 2026 at 1:08 PM Prashant Singh via > dev > > < > > > > > > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Thank you for starting the thread Dmitri ! > > > > > > > > > > > > > Thank you Peirre for the response, I certainly > missed > > > > this > > > > > > > > section > > > > > > > > > of > > > > > > > > > > > the > > > > > > > > > > > > > design document. > > > > > > > > > > > > > > > > > > > > > > > > > > I believe I was expecting a design doc explaining > why > > > we > > > > > > want to > > > > > > > > > > > > > selectively retain the entities which are not the > > > current > > > > > > version > > > > > > > > > as > > > > > > > > > > if > > > > > > > > > > > > > NoSQL implementation cares about this, is there any > > > > design > > > > > > for > > > > > > > > > this ? > > > > > > > > > > > > > secondly as proposed in the doc we should just be > > > > cleaning > > > > > > all > > > > > > > > the > > > > > > > > > > > > entities > > > > > > > > > > > > > that are not current so I am unsure why we want to > > have > > > > > > age>=30 > > > > > > > > > days > > > > > > > > > > > kind > > > > > > > > > > > > > of retention ? If we selectively want to > > > > > > > > > > > > > retain, we need to have a design doc for it to > > explain > > > > use > > > > > > cases, > > > > > > > > > > agree > > > > > > > > > > > > on > > > > > > > > > > > > > user facing constructs and other, for example a > > > possible > > > > > > > > > > interpretation > > > > > > > > > > > > is > > > > > > > > > > > > > can i go back to the state of the catalog as of 30 > > days > > > > > ago ? > > > > > > > > > > > > > I don't think Polaris supports undrop or time > travel, > > > > and I > > > > > > don;t > > > > > > > > > > JDBC > > > > > > > > > > > > will > > > > > > > > > > > > > be able to support it, so I believe NoSQL's > *default* > > > > > > behaviour > > > > > > > > > > should > > > > > > > > > > > be > > > > > > > > > > > > > delete everything that's not current. > > > > > > > > > > > > > > > > > > > > > > > > > > I can see the admin tool mentioned, but what I > can't > > > see > > > > in > > > > > > the > > > > > > > > > > > > > presentation is this whole module, design trade off > > of > > > > sync > > > > > > vs > > > > > > > > > async > > > > > > > > > > > > > maintenance, user specific constructs, for example > > > > > retention > > > > > > > > > > > expression, > > > > > > > > > > > > > why is it required. I believe > > > > > > > > > > > > > those things warrant a design for themselves is my > > > take. > > > > > > > > > > > > > > > > > > > > > > > > > > With that being said I totally understand NoSQL > > > requires > > > > > > > > > maintenance, > > > > > > > > > > > > what > > > > > > > > > > > > > I fail to understand is why does NoSQL require > > > retention > > > > > > > > > expressions > > > > > > > > > > ? > > > > > > > > > > > > why > > > > > > > > > > > > > can't everything that's not currently marked as a > GC > > > > > > candidate, > > > > > > > > if > > > > > > > > > > the > > > > > > > > > > > > > issue is we need this for > > > > > > > > > > > > > debugging then we should just have a simple config > > > saying > > > > > > keep > > > > > > > > the > > > > > > > > > > > > latest X > > > > > > > > > > > > > commits. To me it feels like we are opening for > cases > > > > such > > > > > as > > > > > > > > time > > > > > > > > > > > travel > > > > > > > > > > > > > and undrop without border agreement with the > > community. > > > > If > > > > > we > > > > > > > > want > > > > > > > > > to > > > > > > > > > > > do > > > > > > > > > > > > > these additional things and expose these extra > > > constructs > > > > > > which > > > > > > > > > > > > > I think are good to do, they can't be part of the > > > polaris > > > > > > repo > > > > > > > > but > > > > > > > > > > > would > > > > > > > > > > > > be > > > > > > > > > > > > > a good tool for polaris goodies. > > > > > > > > > > > > > > > > > > > > > > > > > > Hence was the request to open the discussion in the > > > > thread > > > > > as > > > > > > > > well > > > > > > > > > as > > > > > > > > > > > > have > > > > > > > > > > > > > a debate on where this tool would be, because Admin > > > tool > > > > > > > > presently > > > > > > > > > > just > > > > > > > > > > > > has > > > > > > > > > > > > > bootstrap and purge which are supported by both the > > > > > > persistence > > > > > > > > but > > > > > > > > > > > > > maintenance is just NoSQL specific > > > > > > > > > > > > > and there is no way JDBC and IMHO it would be very > > > > > confusing > > > > > > for > > > > > > > > > end > > > > > > > > > > > user > > > > > > > > > > > > > to see i can't retain my catalog state as of 30 > days > > in > > > > > JDBC > > > > > > vs > > > > > > > > in > > > > > > > > > > > NOSQL > > > > > > > > > > > > so > > > > > > > > > > > > > leaking this to admin tool, IMHO is not a good > idea, > > > but > > > > am > > > > > > open > > > > > > > > to > > > > > > > > > > > > hearing > > > > > > > > > > > > > others on why its is and how this concern is > handled! > > > > > > > > > > > > > > > > > > > > > > > > > > Regarding the expression language introduction (I > > > humbly > > > > > > disagree > > > > > > > > > > that > > > > > > > > > > > we > > > > > > > > > > > > > need one), I went till the 8th page of this > > > > > > > > projectnessie/cel-java > > > > > > > > > > [1] > > > > > > > > > > > > this > > > > > > > > > > > > > has just done dependency update where as > > > googles/cel-java > > > > > is > > > > > > > > > > something > > > > > > > > > > > > > google developers are actively working and cel-java > > > > > > > > > > > > > is an google's spec so i would rather use > > > google/cel-java > > > > > > rather > > > > > > > > > than > > > > > > > > > > > > have > > > > > > > > > > > > > a third party dependency of the same spec > > > implementation > > > > > > which > > > > > > > > > google > > > > > > > > > > > > owns. > > > > > > > > > > > > > > > > > > > > > > > > > > With that being said I am open to hearing from > others > > > as > > > > to > > > > > > why > > > > > > > > > such > > > > > > > > > > > > > constructs should be present in the NoSQL specially > > > > > retained > > > > > > > > staff > > > > > > > > > > age > > > > > > > > > > > <= > > > > > > > > > > > > > 30 ? > > > > > > > > > > > > > > > > > > > > > > > > > > On an orthogonal note : It would have been better > if > > we > > > > > would > > > > > > > > have > > > > > > > > > > had > > > > > > > > > > > > > these discussions before we merged the PR. > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you again Dmitri for starting this > > conversation, > > > I > > > > > > really > > > > > > > > > > > > appreciate > > > > > > > > > > > > > it ! > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/polaris/pull/3268#pullrequestreview-3576273215 > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > Prashant Singh > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jan 15, 2026 at 2:24 AM Pierre Laporte < > > > > > > > > > > [email protected]> > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Dmitri, thanks for the comprehensive recap. > > > > > > > > > > > > > > > > > > > > > > > > > > > > For "the newly added Maintenance module was not > > > exposed > > > > > in > > > > > > > > > previous > > > > > > > > > > > > docs > > > > > > > > > > > > > > related to NoSQL", I wonder whether this is just > a > > > > > > > > > > misunderstanding. > > > > > > > > > > > > As > > > > > > > > > > > > > > Prashant noted, in the NoSQL presentation that > was > > > run > > > > a > > > > > > couple > > > > > > > > > of > > > > > > > > > > > > times > > > > > > > > > > > > > by > > > > > > > > > > > > > > Adam [1], there is a mention of "A maintenance > task > > > in > > > > > the > > > > > > > > Admin > > > > > > > > > > CLI > > > > > > > > > > > > > > Tool". And the original design doc [2] also > > contains > > > > an > > > > > > > > > > explanation > > > > > > > > > > > as > > > > > > > > > > > > > to > > > > > > > > > > > > > > why this is necessary for NoSQL in the "Handling > no > > > > > longer > > > > > > > > needed > > > > > > > > > > > > > objects" > > > > > > > > > > > > > > section. Am I missing something? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regarding the repository choice, I would like to > > > > > emphasize > > > > > > the > > > > > > > > > > > > potential > > > > > > > > > > > > > > overhead in release management. Today, we have a > > > > manual > > > > > > > > release > > > > > > > > > > > > process > > > > > > > > > > > > > > that only spans the `apache/polaris` repository. > > And > > > > we > > > > > > have a > > > > > > > > > > > > > > semi-automated release process that is tighly > > coupled > > > > > with > > > > > > the > > > > > > > > > > > > > > `apache/polaris` repository. Tightly coupled > > because > > > > it > > > > > is > > > > > > > > > > > implemented > > > > > > > > > > > > > as > > > > > > > > > > > > > > Github workflows within that repository. Let's > > > > consider > > > > > > the > > > > > > > > > > > potential > > > > > > > > > > > > > > impacts on release process and cadence. > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/presentation/d/1lX2EdvM0SeyuOdO_u1idlWfmnlH3hFE16JEyWo45Bdo/edit?slide=id.p24#slide=id.p24 > > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1POUWe0xMZOBoaJ6Rgiw35ziEoc6OEYCiW7Zk6bR9H6M/edit?tab=t.0#heading=h.ccj3ewbhhhhy > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > > > > > > > Pierre > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jan 14, 2026 at 11:18 PM Dmitri > > Bourlatchkov > > > < > > > > > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As Prashant mentioned in GH [1], the newly > added > > > > > > Maintenance > > > > > > > > > > module > > > > > > > > > > > > was > > > > > > > > > > > > > > not > > > > > > > > > > > > > > > exposed in previous docs related to NoSQL. > Let's > > > use > > > > > this > > > > > > > > email > > > > > > > > > > > > thread > > > > > > > > > > > > > to > > > > > > > > > > > > > > > discuss it and possible concerns people may > have. > > > > > Below, > > > > > > I'm > > > > > > > > > > > > providing > > > > > > > > > > > > > > > rationale for topics, of which I am aware. > Please > > > > feel > > > > > > free > > > > > > > > to > > > > > > > > > > > start > > > > > > > > > > > > > new > > > > > > > > > > > > > > > threads dedicated to other concerns. Let's keep > > > this > > > > > > > > discussion > > > > > > > > > > > > focused > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > the NoSQL maintenance functionality, though. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > * Why is this code necessary? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > NoSQL persistence is not transactional. Even > > normal > > > > > > commits > > > > > > > > > leave > > > > > > > > > > > > some > > > > > > > > > > > > > > > amount of historical data in the database. > Failed > > > > > > commits may > > > > > > > > > > leave > > > > > > > > > > > > > > > remnants of preparatory data in the database > too. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If not cleaned up, this will lead to virtually > > > > > indefinite > > > > > > > > > growth > > > > > > > > > > of > > > > > > > > > > > > > > > persisted data over time. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Therefore, some periodic async cleanup is > > > necessary. > > > > > The > > > > > > > > > > > maintenance > > > > > > > > > > > > > code > > > > > > > > > > > > > > > in PR [3268] provides fundamental code for > > > performing > > > > > > this > > > > > > > > > > cleanup. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > * Why does it have to be in the main repo? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The code in PR [3268] has to align tightly with > > the > > > > > > actual > > > > > > > > > NoSQL > > > > > > > > > > > > > > > Persistence implementation. It has to evolve in > > > sync > > > > > > with the > > > > > > > > > > data > > > > > > > > > > > > > model > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > stored data. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Therefore, it is logical to keep it in the same > > > repo > > > > as > > > > > > the > > > > > > > > > > > > mainstream > > > > > > > > > > > > > > > NoSQL Persistence code. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > * Why is CEL required? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > CEL was chosen based on prior work when the > NoSQL > > > > > > Persistence > > > > > > > > > was > > > > > > > > > > > > > > developed > > > > > > > > > > > > > > > in private. It provides an efficient and > > expressive > > > > > > medium > > > > > > > > for > > > > > > > > > > > admin > > > > > > > > > > > > > > users > > > > > > > > > > > > > > > to define NoSQL maintenance policies. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > * Why is the Nessie CEL java impl. used? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The Nessie CEL java impl. predates the Google > > impl. > > > > and > > > > > > has > > > > > > > > > been > > > > > > > > > > > used > > > > > > > > > > > > > in > > > > > > > > > > > > > > > production for years under various projects > > > > (including > > > > > > Nessie > > > > > > > > > > > > itself). > > > > > > > > > > > > > > The > > > > > > > > > > > > > > > developers of the NoSQL persistence are more > > > certain > > > > of > > > > > > the > > > > > > > > > > runtime > > > > > > > > > > > > > > > behavior of the Nessie CEL impl. than of > > Google's. > > > > > > Switching > > > > > > > > to > > > > > > > > > > > > > Google's > > > > > > > > > > > > > > > CEL java requires additional work. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > * Can we express maintenance policies in some > > > other, > > > > > > non-CEL > > > > > > > > > way? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Generally yes. However, this requires extra > work > > > and > > > > > > analysis > > > > > > > > > of > > > > > > > > > > UX > > > > > > > > > > > > > > impact. > > > > > > > > > > > > > > > If anyone has a concrete proposal for non-CEL > > > > > maintenance > > > > > > > > > > policies, > > > > > > > > > > > > > > ideas / > > > > > > > > > > > > > > > PRs are welcome for discussion, of course. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > * Why does the Admin Tool has to have > maintenance > > > > > > commands > > > > > > > > > > [3395]? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This is to allow users of Apache Polaris binary > > > > > > distributions > > > > > > > > > to > > > > > > > > > > > > > perform > > > > > > > > > > > > > > > maintenance should they choose NoSQL > Persistence. > > > The > > > > > > Admin > > > > > > > > > Tool > > > > > > > > > > > is a > > > > > > > > > > > > > > > natural home for the maintenance CLI because it > > is > > > in > > > > > > fact > > > > > > > > > > intended > > > > > > > > > > > > to > > > > > > > > > > > > > > > perform direct manipulation of the Polaris > > > database, > > > > > > such as > > > > > > > > > > > creating > > > > > > > > > > > > > the > > > > > > > > > > > > > > > schema and bootstrapping realms (existing > > > > > functionality). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > * Can the maintenance command [3395] live in > the > > > > > > > > polaris-tools > > > > > > > > > > > repo? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This would effectively require the Admin Tool > to > > > live > > > > > in > > > > > > > > > > > > polaris-tools, > > > > > > > > > > > > > > > which seems to be against the recent move to > > unify > > > > > Admin > > > > > > and > > > > > > > > > > > Service > > > > > > > > > > > > > > > binaries [3340]. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > * Can the maintenance code be invoked in some > > other > > > > way > > > > > > > > > > > > > (non-Admin-CLI)? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes. For example, it is possible to build > docker > > > > images > > > > > > > > > dedicated > > > > > > > > > > > to > > > > > > > > > > > > > > > running the maintenance tasks without using the > > > Admin > > > > > > CLI. > > > > > > > > This > > > > > > > > > > is > > > > > > > > > > > > not > > > > > > > > > > > > > > > implemented in Apache Polaris yet. The Admin > CLI > > > > > appears > > > > > > to > > > > > > > > > offer > > > > > > > > > > > the > > > > > > > > > > > > > > best > > > > > > > > > > > > > > > UX for admin users with minimal developer > effort. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/polaris/pull/3268#pullrequestreview-3576273215 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [3340] > > https://github.com/apache/polaris/pull/3340 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [3268] > > https://github.com/apache/polaris/pull/3268 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [3395] > > https://github.com/apache/polaris/pull/3395 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thought? Comments? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > Dmitri. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
