Hi Dmitri, thanks for the comprehensive recap.

For "the newly added Maintenance module was not exposed in previous docs
related to NoSQL", I wonder whether this is just a misunderstanding.  As
Prashant noted, in the NoSQL presentation that was run a couple of times by
Adam [1], there is a mention of "A maintenance task in the Admin CLI
Tool".  And the original design doc [2] also contains an explanation as to
why this is necessary for NoSQL in the "Handling no longer needed objects"
section.  Am I missing something?

Regarding the repository choice, I would like to emphasize the potential
overhead in release management.  Today, we have a manual release process
that only spans the `apache/polaris` repository.  And we have a
semi-automated release process that is tighly coupled with the
`apache/polaris` repository.  Tightly coupled because it is implemented as
Github workflows within that repository.  Let's consider the potential
impacts on release process and cadence.

[1]
https://docs.google.com/presentation/d/1lX2EdvM0SeyuOdO_u1idlWfmnlH3hFE16JEyWo45Bdo/edit?slide=id.p24#slide=id.p24
[2]
https://docs.google.com/document/d/1POUWe0xMZOBoaJ6Rgiw35ziEoc6OEYCiW7Zk6bR9H6M/edit?tab=t.0#heading=h.ccj3ewbhhhhy
--

Pierre


On Wed, Jan 14, 2026 at 11:18 PM Dmitri Bourlatchkov <[email protected]>
wrote:

> Hi All,
>
> As Prashant mentioned in GH [1], the newly added Maintenance module was not
> exposed in previous docs related to NoSQL. Let's use this email thread to
> discuss it and possible concerns people may have. Below, I'm providing
> rationale for topics, of which I am aware. Please feel free to start new
> threads dedicated to other concerns. Let's keep this discussion focused on
> the NoSQL maintenance functionality, though.
>
> * Why is this code necessary?
>
> NoSQL persistence is not transactional. Even normal commits leave some
> amount of historical data in the database. Failed commits may leave
> remnants of preparatory data in the database too.
>
> If not cleaned up, this will lead to virtually indefinite growth of
> persisted data over time.
>
> Therefore, some periodic async cleanup is necessary. The maintenance code
> in PR [3268] provides fundamental code for performing this cleanup.
>
> * Why does it have to be in the main repo?
>
> The code in PR [3268] has to align tightly with the actual NoSQL
> Persistence implementation. It has to evolve in sync with the data model of
> stored data.
>
> Therefore, it is logical to keep it in the same repo as the mainstream
> NoSQL Persistence code.
>
> * Why is CEL required?
>
> CEL was chosen based on prior work when the NoSQL Persistence was developed
> in private. It provides an efficient and expressive medium for admin users
> to define NoSQL maintenance policies.
>
> * Why is the Nessie CEL java impl. used?
>
> The Nessie CEL java impl. predates the Google impl. and has been used in
> production for years under various projects (including Nessie itself). The
> developers of the NoSQL persistence are more certain of the runtime
> behavior of the Nessie CEL impl. than of Google's. Switching to Google's
> CEL java requires additional work.
>
> * Can we express maintenance policies in some other, non-CEL way?
>
> Generally yes. However, this requires extra work and analysis of UX impact.
> If anyone has a concrete proposal for non-CEL maintenance policies, ideas /
> PRs are welcome for discussion, of course.
>
> * Why does the Admin Tool has to have maintenance commands [3395]?
>
> This is to allow users of Apache Polaris binary distributions to perform
> maintenance should they choose NoSQL Persistence. The Admin Tool is a
> natural home for the maintenance CLI because it is in fact intended to
> perform direct manipulation of the Polaris database, such as creating the
> schema and bootstrapping realms (existing functionality).
>
> * Can the maintenance command [3395] live in the polaris-tools repo?
>
> This would effectively require the Admin Tool to live in polaris-tools,
> which seems to be against the recent move to unify Admin and Service
> binaries [3340].
>
> * Can the maintenance code be invoked in some other way (non-Admin-CLI)?
>
> Yes. For example, it is possible to build docker images dedicated to
> running the maintenance tasks without using the Admin CLI. This is not
> implemented in Apache Polaris yet. The Admin CLI appears to offer the best
> UX for admin users with minimal developer effort.
>
> [1]
> https://github.com/apache/polaris/pull/3268#pullrequestreview-3576273215
>
> [3340] https://github.com/apache/polaris/pull/3340
>
> [3268] https://github.com/apache/polaris/pull/3268
>
> [3395] https://github.com/apache/polaris/pull/3395
>
> Thought? Comments?
>
> Cheers,
> Dmitri.
>

Reply via email to