Hi Yufei,

Thanks for sharing your perspective.

Cross-module coupling is a valid concern, I agree.

However, moving NoSQL maintenance code to polaris-tools will complicate the
distribution, I think. As I mentioned in the Community Sync call today,
Polaris users need to be able to control the size of the database and
remove dangling data if they are to use NoSQL persistence effectively.
Therefore, I believe a Polaris release with NoSQL Persistence needs to
provide the corresponding maintenance tools. Now, if the tools were in the
polaris-tools repository, they would have to be included into the service
binary distribution, which is exactly the complication I mean.

That said, I believe it should be possible to leverage CDI in the Admin
Tool the same way we leverage CDI in the Server to allow custom plugins and
extension points. It will certainly take some follow-up work, but I hope we
should be able to have "extended" admin commands in isolated source
sub-modules and only assemble all dependencies at the tool build time. This
should alleviate coupling concerns, I hope.

The same should be possible even if we unify the Admin Tool and the Server
Quarkus applications as you proposed in [3340]. I still think that PR
deserves a refresh and a push forward.

All in all, as far as I can see from GH comments and this discussion, the
majority is leaning towards adding NoSQL maintenance commands to the Admin
Tool. If you do not feel too strongly about this, perhaps we could merge
[3395] to achieve a coherent user story for 1.4.0 and consider different
approaches later. WDYT?

[3340] https://github.com/apache/polaris/pull/3340

[3395] https://github.com/apache/polaris/pull/3395

Thanks,
Dmitri.

On Wed, Feb 4, 2026 at 9:12 PM Yufei Gu <[email protected]> wrote:

> Sorry for the late reply, and thanks Dmitri for the detailed rationale and
> everyone for the feedback so far.
> I agree that NoSQL maintenance is necessary, and I like the direction of
> making it explicit and well namespaced from a UX standpoint.
>
> The concerns I still have are the release/distribution and cross-module
> coupling. Even if the core maintenance APIs better stay close to the NoSQL
> persistence implementation, I think the metastore-specific maintenance
> tooling and its CLI surface should live in the polaris-tools repo.
> Specifically, the admin commands and any operator facing binaries for
> metastore-specific maintenance should be built and shipped from
> polaris-tools, not from the main repo distribution artifacts.
>
> Why I think this helps
> 1. It reduces release management overhead and keeps the main repo focused
> on the common parts of the server, admin and core modules.
> 2. It creates a scalable pattern as more metastore specific operational
> tooling appears, without growing a single monolithic admin surface over
> time.
> 3. It makes optionality cleaner for downstream users, they can choose to
> pull the tools package when running NoSQL persistence without forcing extra
> dependencies on users running other metastores.
>
> We still keep the necessary shared interfaces(e.g., bootstrap) in the main
> repo, so the tooling can track the metastore schema and evolve nicely, but
> the packaging and module boundary become clearer. Taking the current JDBC
> backend as an example, only runtime dependencies are required for JDBC
> impl. from the admin module, which provides a clean separation. I think we
> should preserve that here as well.
>
> Yufei
>
>
> On Fri, Jan 30, 2026 at 4:47 PM Dmitri Bourlatchkov <[email protected]>
> wrote:
>
> > Another (hypothetical ATM) maintenance tool could be necessary for the
> > scan metrics from Anand's proposal [1]
> >
> > [1] https://lists.apache.org/thread/c83jnkvlwc2k3swm65cmvl4t0mt7p799
> >
> > Cheers,
> > Dmitri.
> >
> > On 2026/01/16 20:01:06 Dmitri Bourlatchkov wrote:
> > > Hi Russell,
> > >
> > > Re: "other" maintenance tools, it's a bit off-topic here, but I've been
> > > thinking about separating schema management from "initial data
> > management"
> > > (cf. [3446]).
> > >
> > > The schema evolution tasks are likely different for each database,
> while
> > > things like creating the first "root" principal should probably be the
> > same
> > > for all backends.
> > >
> > > ... but it's a different topic really :)
> > >
> > > [3446] https://github.com/apache/polaris/pull/3446
> > >
> > > Cheers,
> > > Dmitri.
> > >
> > > On Fri, Jan 16, 2026 at 2:50 PM Russell Spitzer <
> > [email protected]>
> > > wrote:
> > >
> > > > CEL Comments make sense to me, personally I would'nt use them in this
> > > > context but that's just my personal bias :)
> > > >
> > > > I think nesting makes sense, I was kind of wondering if there were
> > > > other maintenance tools planned?
> > > >
> > > > I was also thinking about the
> > > >
> > > > opposite nesting -
> > > >
> > > > nosql maintenance
> > > > or
> > > > nosql purge
> > > >
> > > > Just so folks know immediately whether or not there is something they
> > need
> > > > in the sub command
> > > >
> > > > On Fri, Jan 16, 2026 at 12:33 PM Dmitri Bourlatchkov <
> [email protected]
> > >
> > > > wrote:
> > > >
> > > > > Hi Russell,
> > > > >
> > > > > I agree that the plain "maintenance" Admi CLI command name is too
> > generic
> > > > > in this context.
> > > > >
> > > > > I believe our existing Admin CLI tooling allows for command
> nesting.
> > How
> > > > > about "maintenance nosql purge"?
> > > > >
> > > > > Please see my reply to Dennis about CEL expressions.
> > > > >
> > > > > Cheers,
> > > > > Dmitri.
> > > > >
> > > > > On Fri, Jan 16, 2026 at 1:05 PM Russell Spitzer <
> > > > [email protected]
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > This is a great discussion,
> > > > > >
> > > > > > I think the maintenance particularlities of NoSQL are fine to be
> > > > managed
> > > > > by
> > > > > > those
> > > > > > with the expertise in that system. So I'm not particularly
> worried
> > > > about
> > > > > it
> > > > > > having
> > > > > > some capabilities that are not explicitly present in other
> storage
> > > > > layers.
> > > > > > (Special DR,
> > > > > > explicit GC, or whatnot)
> > > > > >
> > > > > > I do agree with some of Dennis' concerns about the actual CLI
> > > > integration
> > > > > > just
> > > > > > because I would want to make clear what commands are generally
> > > > applicable
> > > > > > and those which apply to a single persistence layer. In my mind,
> > > > "noSql"
> > > > > is
> > > > > > a
> > > > > > Database in itself and to me that would mean its maintenance is
> not
> > > > > really
> > > > > > a part of generic Polaris administration. Similar to how we don't
> > > > expect
> > > > > > Polaris
> > > > > > to have commands that do cleanup for Postgres or whatnot. That
> > said I
> > > > > > understand
> > > > > > it would be much easier from an end user standpoint if there
> > weren't
> > > > > > multiple tools
> > > > > > so I'm not opposed to including it with appropriate namespacing.
> > > > Keeping
> > > > > it
> > > > > > as just a
> > > > > > base "maintenance" seems a bit misleading.
> > > > > >
> > > > > > The exposure of CEL expressions to the client does seem to be a
> > bit of
> > > > a
> > > > > > bigger issue, that does seem to be a lot of power for what has a
> > rather
> > > > > > limited
> > > > > > set of valid settings? This is more of a personal coding opinion,
> > but I
> > > > > > generally want
> > > > > > to limit the range of possible inputs whenever possible (and even
> > > > remove
> > > > > > options entirely
> > > > > > if it doesn't make sense for an end user to change them.) It
> looks
> > like
> > > > > at
> > > > > > the moment the only option is
> > > > > > "run"?
> > > > > >
> > > > > > On Fri, Jan 16, 2026 at 2:12 AM Dennis Huo <[email protected]>
> > wrote:
> > > > > >
> > > > > > > Thanks Dmitri for kicking off this thread!
> > > > > > >
> > > > > > > I think even just laying out the design considerations in the
> > form
> > > > of a
> > > > > > Q&A
> > > > > > > like you did here is great as a supplemental design artifact
> for
> > > > > > posterity
> > > > > > > and this helps address the "documentation" questions I brought
> > up in
> > > > > > > https://github.com/apache/polaris/pull/3268
> > > > > > >
> > > > > > > Personally I'm okay with having it in the main shared admin
> > tool, as
> > > > > long
> > > > > > > as we can do it in a way that avoids "monolithic code" scaling
> > issues
> > > > > > that
> > > > > > > can come up as the set of backend-specific things grows. I
> guess
> > this
> > > > > is
> > > > > > a
> > > > > > > good opportunity to start establishing the precedent for how to
> > > > > > structure:
> > > > > > >
> > > > > > > 1. Hierarchical command syntax? Would it be like "java -jar
> > > > admin-tool
> > > > > > > nosql maintenance garbage-collect --cel-expression='ageDays <
> > 30'"?
> > > > Or
> > > > > > > "java -jar admin-tool maintenance nosql garbage-collect"
> > (maintenance
> > > > > > > before nosql, or is maintenance specific to nosql? would we
> > collect
> > > > > > common
> > > > > > > maintenance commands that are persistence-agnostic into the
> base
> > > > > > > maintenance subcommand?)
> > > > > > > 2. Should we have compile-time options that can choose which
> > > > > subfeatures
> > > > > > to
> > > > > > > build in case there are issues with some subfeature that aren't
> > > > > > applicable
> > > > > > > to the user?
> > > > > > > 3. Should we lay out the code for easy segregation as we scale?
> > We
> > > > may
> > > > > > not
> > > > > > > want one directory that contains a SpannerMaintenance, an
> > > > > > > AliyunMaintenance, FoundationDbMaintenance, etc all next to
> each
> > > > other
> > > > > > >
> > > > > > > I think layout aspects could probably be addressed in an
> > incremental
> > > > > way
> > > > > > > though, so at least I don't have any hard stance on what's the
> > right
> > > > > > > answer, as long as we're flexible in willingness to change the
> > syntax
> > > > > to
> > > > > > be
> > > > > > > more nested/organized when we see the need.
> > > > > > >
> > > > > > > For CEL, I do think it's trickier to evolve it to *take away*
> > > > > > > expressiveness in the future if we let the cat out of the bag
> to
> > > > allow
> > > > > > too
> > > > > > > expressive a language initially, since it pertains to the
> > *semantics*
> > > > > of
> > > > > > > what people running the NoSQL impl come to depend on, beyond
> just
> > > > > > *syntax*
> > > > > > > (i.e., it's somewhat easier to change the CLI's syntax to
> > introduce a
> > > > > > > nesting like "admin-tool nosql maintenance garbage-collect" if
> > the
> > > > > > > underlying functionality is the same, but if someone decides to
> > start
> > > > > > > depending on being able to runtime-specify CEL expressions like
> > > > > > > 'getDayOfWeek(commitTime) == FRIDAY' it's hard to go back to a
> > > > simpler
> > > > > > > world where we didn't have to deal with that).
> > > > > > >
> > > > > > > Note, I might be exaggerating my assumption about what the CEL
> > > > > expression
> > > > > > > supports here since I remember offhand the details about the
> > part of
> > > > > the
> > > > > > > code that consumes it and I couldn't find docs on what we
> expect
> > the
> > > > > > > structure of the input to the expression to be and what kind of
> > CEL
> > > > > > > expressions are actually allowed.
> > > > > > >
> > > > > > > Your clarification that Polaris *Servers* won't need CEL on the
> > > > > classpath
> > > > > > > does help assuage my concerns about having it has a heavyweight
> > > > > > dependency
> > > > > > > somewhat, but I think it's still prudent to know whether the
> > intended
> > > > > use
> > > > > > > cases are a substantially more restrictive set of conditions
> > > > (probably
> > > > > > > minimumNumToKeep and maxAge, right?).
> > > > > > >
> > > > > > > If we capture the pros/cons it'll help our future selves not
> > have to
> > > > > redo
> > > > > > > the work in considering expressiveness vs precision/clarity of
> > > > > interface
> > > > > > if
> > > > > > > someone tries to evolve the interface again in the future. I
> > guess an
> > > > > > > argument in favor of CEL is that it's cumbersome/messier enough
> > > > trying
> > > > > to
> > > > > > > express a combined numToKeep and maxAge condition in terms a
> > multiple
> > > > > > > different config values that interact.
> > > > > > >
> > > > > > > On Thu, Jan 15, 2026 at 3:15 PM Dmitri Bourlatchkov <
> > > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Prashant,
> > > > > > > >
> > > > > > > > You bring up valid points. Apologies for not explaining them
> > > > > upfront. I
> > > > > > > > suppose it's human nature to take things for granted when
> > you've
> > > > been
> > > > > > > > working with them for a while :)
> > > > > > > >
> > > > > > > > The need to retain more than just the latest state of the
> > catalog
> > > > is
> > > > > > > > primarily rooted in Disaster Recovery scenarios (specifically
> > with
> > > > > > NoSQL
> > > > > > > > persistence).
> > > > > > > >
> > > > > > > > In short, a DR situation may leave the latest state unusable
> > (e.g.
> > > > > due
> > > > > > to
> > > > > > > > replication lag... exact failures are kind of complex and
> > probably
> > > > > > > require
> > > > > > > > a separate discussion), so the admin user may have to reset
> the
> > > > > catalog
> > > > > > > to
> > > > > > > > a previous state. This would be a data loss situation, of
> > course,
> > > > but
> > > > > > it
> > > > > > > > may be the best option to recover some data compared to total
> > loss.
> > > > > > > >
> > > > > > > > This is not actualized as specific user-level tools in OSS
> yet.
> > > > Full
> > > > > DR
> > > > > > > > support requires considerable follow-up work.
> > > > > > > >
> > > > > > > > Whether the flexibility provided by CEL is really required
> for
> > end
> > > > > > users
> > > > > > > > can probably be debated. Let me think more about that.
> > > > > > > >
> > > > > > > > Re: CEL java maintainability, the Nessie CEL implements a
> > > > particular
> > > > > > > > version of the CEL spec and passes Google's conformance
> tests.
> > > > > > Therefore,
> > > > > > > > it is a correct CEL impl. Whether it needs to adopt newer
> spec
> > > > > > revisions
> > > > > > > is
> > > > > > > > not really a maintenance burden in Polaris unless we want to
> > always
> > > > > use
> > > > > > > the
> > > > > > > > latest CEL spec, which IMHO is not a requirement as the
> > supported
> > > > > > version
> > > > > > > > is already pretty expressive. Please consider that CEL is
> > engaged
> > > > > only
> > > > > > > when
> > > > > > > > the user performs NoSQL maintenance, otherwise it is just a
> jar
> > > > > inside
> > > > > > > the
> > > > > > > > Admin Tool. Polaris Servers should not need CEL on the class
> > path,
> > > > > > AFAIK.
> > > > > > > >
> > > > > > > > Re: sync vs. async maintenance, sync cannot be reliable if
> you
> > > > assume
> > > > > > > that
> > > > > > > > any node can be killed at any time (which is the reality in
> > k8s).
> > > > > > > >
> > > > > > > > Re: exposing NoSQL-specific commands in the Admin Tool, I
> > > > personally
> > > > > > > think
> > > > > > > > it is similar to supporting different storage technologies in
> > the
> > > > > > Catalog
> > > > > > > > config (e.g. GCS vs. S3). Polaris CLI has a multitude of
> > options
> > > > for
> > > > > > the
> > > > > > > > union of them, but not all features of one storage type are
> > > > > applicable
> > > > > > to
> > > > > > > > others.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Dmitri.
> > > > > > > >
> > > > > > > > On Thu, Jan 15, 2026 at 1:08 PM Prashant Singh via dev <
> > > > > > > > [email protected]> wrote:
> > > > > > > >
> > > > > > > > > Thank you for starting the thread Dmitri !
> > > > > > > > > Thank you Peirre for the response, I certainly missed this
> > > > section
> > > > > of
> > > > > > > the
> > > > > > > > > design document.
> > > > > > > > >
> > > > > > > > > I believe I was expecting a design doc explaining why we
> > want to
> > > > > > > > > selectively retain the entities which are not the current
> > version
> > > > > as
> > > > > > if
> > > > > > > > > NoSQL implementation cares about this, is there any design
> > for
> > > > > this ?
> > > > > > > > > secondly as proposed in the doc we should just be cleaning
> > all
> > > > the
> > > > > > > > entities
> > > > > > > > > that are not current so I am unsure why we want to have
> > age>=30
> > > > > days
> > > > > > > kind
> > > > > > > > > of retention ? If we selectively want to
> > > > > > > > > retain, we need to have a design doc for it to explain use
> > cases,
> > > > > > agree
> > > > > > > > on
> > > > > > > > > user facing constructs and other, for example a possible
> > > > > > interpretation
> > > > > > > > is
> > > > > > > > > can i go back to the state of the catalog as of 30 days
> ago ?
> > > > > > > > > I don't think Polaris supports undrop or time travel, and I
> > don;t
> > > > > > JDBC
> > > > > > > > will
> > > > > > > > > be able to support it, so I believe NoSQL's *default*
> > behaviour
> > > > > > should
> > > > > > > be
> > > > > > > > > delete everything that's not current.
> > > > > > > > >
> > > > > > > > > I can see the admin tool mentioned, but what I can't see in
> > the
> > > > > > > > > presentation is this whole module, design trade off of sync
> > vs
> > > > > async
> > > > > > > > > maintenance, user specific constructs, for example
> retention
> > > > > > > expression,
> > > > > > > > > why is it required. I believe
> > > > > > > > > those things warrant a design for themselves is my take.
> > > > > > > > >
> > > > > > > > > With that being said I totally understand NoSQL requires
> > > > > maintenance,
> > > > > > > > what
> > > > > > > > > I fail to understand is why does NoSQL require retention
> > > > > expressions
> > > > > > ?
> > > > > > > > why
> > > > > > > > > can't everything that's not currently marked as a GC
> > candidate,
> > > > if
> > > > > > the
> > > > > > > > > issue is we need this for
> > > > > > > > > debugging then we should just have a simple config saying
> > keep
> > > > the
> > > > > > > > latest X
> > > > > > > > > commits. To me it feels like we are opening for cases such
> as
> > > > time
> > > > > > > travel
> > > > > > > > > and undrop without border agreement with the community. If
> we
> > > > want
> > > > > to
> > > > > > > do
> > > > > > > > > these additional things and expose these extra constructs
> > which
> > > > > > > > > I think are good to do, they can't be part of the polaris
> > repo
> > > > but
> > > > > > > would
> > > > > > > > be
> > > > > > > > > a good tool for polaris goodies.
> > > > > > > > >
> > > > > > > > > Hence was the request to open the discussion in the thread
> as
> > > > well
> > > > > as
> > > > > > > > have
> > > > > > > > > a debate on where this tool would be, because Admin tool
> > > > presently
> > > > > > just
> > > > > > > > has
> > > > > > > > > bootstrap and purge which are supported by both the
> > persistence
> > > > but
> > > > > > > > > maintenance is just NoSQL specific
> > > > > > > > > and there is no way JDBC and IMHO it would be very
> confusing
> > for
> > > > > end
> > > > > > > user
> > > > > > > > > to see i can't retain my catalog state as of 30 days in
> JDBC
> > vs
> > > > in
> > > > > > > NOSQL
> > > > > > > > so
> > > > > > > > > leaking this to admin tool, IMHO is not a good idea, but am
> > open
> > > > to
> > > > > > > > hearing
> > > > > > > > > others on why its is and how this concern is handled!
> > > > > > > > >
> > > > > > > > > Regarding the expression language introduction (I humbly
> > disagree
> > > > > > that
> > > > > > > we
> > > > > > > > > need one), I went till the 8th page of this
> > > > projectnessie/cel-java
> > > > > > [1]
> > > > > > > > this
> > > > > > > > > has just done dependency update where as googles/cel-java
> is
> > > > > > something
> > > > > > > > > google developers are actively working and cel-java
> > > > > > > > > is an google's spec so i would rather use google/cel-java
> > rather
> > > > > than
> > > > > > > > have
> > > > > > > > > a third party dependency of the same spec implementation
> > which
> > > > > google
> > > > > > > > owns.
> > > > > > > > >
> > > > > > > > > With that being said I am open to hearing from others as to
> > why
> > > > > such
> > > > > > > > > constructs should be present in the NoSQL specially
> retained
> > > > staff
> > > > > > age
> > > > > > > <=
> > > > > > > > > 30 ?
> > > > > > > > >
> > > > > > > > > On an orthogonal note : It would have been better if we
> would
> > > > have
> > > > > > had
> > > > > > > > > these discussions before we merged the PR.
> > > > > > > > >
> > > > > > > > > Thank you again Dmitri for starting this conversation, I
> > really
> > > > > > > > appreciate
> > > > > > > > > it  !
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > >
> > > > >
> > https://github.com/apache/polaris/pull/3268#pullrequestreview-3576273215
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Prashant Singh
> > > > > > > > >
> > > > > > > > > On Thu, Jan 15, 2026 at 2:24 AM Pierre Laporte <
> > > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Dmitri, thanks for the comprehensive recap.
> > > > > > > > > >
> > > > > > > > > > For "the newly added Maintenance module was not exposed
> in
> > > > > previous
> > > > > > > > docs
> > > > > > > > > > related to NoSQL", I wonder whether this is just a
> > > > > > misunderstanding.
> > > > > > > > As
> > > > > > > > > > Prashant noted, in the NoSQL presentation that was run a
> > couple
> > > > > of
> > > > > > > > times
> > > > > > > > > by
> > > > > > > > > > Adam [1], there is a mention of "A maintenance task in
> the
> > > > Admin
> > > > > > CLI
> > > > > > > > > > Tool".  And the original design doc [2] also contains an
> > > > > > explanation
> > > > > > > as
> > > > > > > > > to
> > > > > > > > > > why this is necessary for NoSQL in the "Handling no
> longer
> > > > needed
> > > > > > > > > objects"
> > > > > > > > > > section.  Am I missing something?
> > > > > > > > > >
> > > > > > > > > > Regarding the repository choice, I would like to
> emphasize
> > the
> > > > > > > > potential
> > > > > > > > > > overhead in release management.  Today, we have a manual
> > > > release
> > > > > > > > process
> > > > > > > > > > that only spans the `apache/polaris` repository.  And we
> > have a
> > > > > > > > > > semi-automated release process that is tighly coupled
> with
> > the
> > > > > > > > > > `apache/polaris` repository.  Tightly coupled because it
> is
> > > > > > > implemented
> > > > > > > > > as
> > > > > > > > > > Github workflows within that repository.  Let's consider
> > the
> > > > > > > potential
> > > > > > > > > > impacts on release process and cadence.
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://docs.google.com/presentation/d/1lX2EdvM0SeyuOdO_u1idlWfmnlH3hFE16JEyWo45Bdo/edit?slide=id.p24#slide=id.p24
> > > > > > > > > > [2]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1POUWe0xMZOBoaJ6Rgiw35ziEoc6OEYCiW7Zk6bR9H6M/edit?tab=t.0#heading=h.ccj3ewbhhhhy
> > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > Pierre
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Jan 14, 2026 at 11:18 PM Dmitri Bourlatchkov <
> > > > > > > [email protected]
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi All,
> > > > > > > > > > >
> > > > > > > > > > > As Prashant mentioned in GH [1], the newly added
> > Maintenance
> > > > > > module
> > > > > > > > was
> > > > > > > > > > not
> > > > > > > > > > > exposed in previous docs related to NoSQL. Let's use
> this
> > > > email
> > > > > > > > thread
> > > > > > > > > to
> > > > > > > > > > > discuss it and possible concerns people may have.
> Below,
> > I'm
> > > > > > > > providing
> > > > > > > > > > > rationale for topics, of which I am aware. Please feel
> > free
> > > > to
> > > > > > > start
> > > > > > > > > new
> > > > > > > > > > > threads dedicated to other concerns. Let's keep this
> > > > discussion
> > > > > > > > focused
> > > > > > > > > > on
> > > > > > > > > > > the NoSQL maintenance functionality, though.
> > > > > > > > > > >
> > > > > > > > > > > * Why is this code necessary?
> > > > > > > > > > >
> > > > > > > > > > > NoSQL persistence is not transactional. Even normal
> > commits
> > > > > leave
> > > > > > > > some
> > > > > > > > > > > amount of historical data in the database. Failed
> > commits may
> > > > > > leave
> > > > > > > > > > > remnants of preparatory data in the database too.
> > > > > > > > > > >
> > > > > > > > > > > If not cleaned up, this will lead to virtually
> indefinite
> > > > > growth
> > > > > > of
> > > > > > > > > > > persisted data over time.
> > > > > > > > > > >
> > > > > > > > > > > Therefore, some periodic async cleanup is necessary.
> The
> > > > > > > maintenance
> > > > > > > > > code
> > > > > > > > > > > in PR [3268] provides fundamental code for performing
> > this
> > > > > > cleanup.
> > > > > > > > > > >
> > > > > > > > > > > * Why does it have to be in the main repo?
> > > > > > > > > > >
> > > > > > > > > > > The code in PR [3268] has to align tightly with the
> > actual
> > > > > NoSQL
> > > > > > > > > > > Persistence implementation. It has to evolve in sync
> > with the
> > > > > > data
> > > > > > > > > model
> > > > > > > > > > of
> > > > > > > > > > > stored data.
> > > > > > > > > > >
> > > > > > > > > > > Therefore, it is logical to keep it in the same repo as
> > the
> > > > > > > > mainstream
> > > > > > > > > > > NoSQL Persistence code.
> > > > > > > > > > >
> > > > > > > > > > > * Why is CEL required?
> > > > > > > > > > >
> > > > > > > > > > > CEL was chosen based on prior work when the NoSQL
> > Persistence
> > > > > was
> > > > > > > > > > developed
> > > > > > > > > > > in private. It provides an efficient and expressive
> > medium
> > > > for
> > > > > > > admin
> > > > > > > > > > users
> > > > > > > > > > > to define NoSQL maintenance policies.
> > > > > > > > > > >
> > > > > > > > > > > * Why is the Nessie CEL java impl. used?
> > > > > > > > > > >
> > > > > > > > > > > The Nessie CEL java impl. predates the Google impl. and
> > has
> > > > > been
> > > > > > > used
> > > > > > > > > in
> > > > > > > > > > > production for years under various projects (including
> > Nessie
> > > > > > > > itself).
> > > > > > > > > > The
> > > > > > > > > > > developers of the NoSQL persistence are more certain of
> > the
> > > > > > runtime
> > > > > > > > > > > behavior of the Nessie CEL impl. than of Google's.
> > Switching
> > > > to
> > > > > > > > > Google's
> > > > > > > > > > > CEL java requires additional work.
> > > > > > > > > > >
> > > > > > > > > > > * Can we express maintenance policies in some other,
> > non-CEL
> > > > > way?
> > > > > > > > > > >
> > > > > > > > > > > Generally yes. However, this requires extra work and
> > analysis
> > > > > of
> > > > > > UX
> > > > > > > > > > impact.
> > > > > > > > > > > If anyone has a concrete proposal for non-CEL
> maintenance
> > > > > > policies,
> > > > > > > > > > ideas /
> > > > > > > > > > > PRs are welcome for discussion, of course.
> > > > > > > > > > >
> > > > > > > > > > > * Why does the Admin Tool has to have maintenance
> > commands
> > > > > > [3395]?
> > > > > > > > > > >
> > > > > > > > > > > This is to allow users of Apache Polaris binary
> > distributions
> > > > > to
> > > > > > > > > perform
> > > > > > > > > > > maintenance should they choose NoSQL Persistence. The
> > Admin
> > > > > Tool
> > > > > > > is a
> > > > > > > > > > > natural home for the maintenance CLI because it is in
> > fact
> > > > > > intended
> > > > > > > > to
> > > > > > > > > > > perform direct manipulation of the Polaris database,
> > such as
> > > > > > > creating
> > > > > > > > > the
> > > > > > > > > > > schema and bootstrapping realms (existing
> functionality).
> > > > > > > > > > >
> > > > > > > > > > > * Can the maintenance command [3395] live in the
> > > > polaris-tools
> > > > > > > repo?
> > > > > > > > > > >
> > > > > > > > > > > This would effectively require the Admin Tool to live
> in
> > > > > > > > polaris-tools,
> > > > > > > > > > > which seems to be against the recent move to unify
> Admin
> > and
> > > > > > > Service
> > > > > > > > > > > binaries [3340].
> > > > > > > > > > >
> > > > > > > > > > > * Can the maintenance code be invoked in some other way
> > > > > > > > > (non-Admin-CLI)?
> > > > > > > > > > >
> > > > > > > > > > > Yes. For example, it is possible to build docker images
> > > > > dedicated
> > > > > > > to
> > > > > > > > > > > running the maintenance tasks without using the Admin
> > CLI.
> > > > This
> > > > > > is
> > > > > > > > not
> > > > > > > > > > > implemented in Apache Polaris yet. The Admin CLI
> appears
> > to
> > > > > offer
> > > > > > > the
> > > > > > > > > > best
> > > > > > > > > > > UX for admin users with minimal developer effort.
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > https://github.com/apache/polaris/pull/3268#pullrequestreview-3576273215
> > > > > > > > > > >
> > > > > > > > > > > [3340] https://github.com/apache/polaris/pull/3340
> > > > > > > > > > >
> > > > > > > > > > > [3268] https://github.com/apache/polaris/pull/3268
> > > > > > > > > > >
> > > > > > > > > > > [3395] https://github.com/apache/polaris/pull/3395
> > > > > > > > > > >
> > > > > > > > > > > Thought? Comments?
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Dmitri.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to