Thanks for the summary, Alex!

+1 to all points.

Re: actionable items... maybe something like this?

1) Remove bootstrap from the main persistence call paths. Ideally, make a
new interface for it.

2) Control auto-bootstrap from top-level CDI constructs
like ServiceProducers

3) Use Quarkus config for schema options. This way, admin users control the
schema the same way they control DataSources.

4) Adjust PR [2196] to leverage the new config and perhaps add CLI options
to opt in/out of schema changes.

>From my POV the admin user is likely to create / update the schema once per
deployment cycle, but may have to bootstrap (new) realms multiple times
without
redeploying Polaris.

WDYT?

Cheers,
Dmitri.

[2196] https://github.com/apache/polaris/pull/2196


On Tue, Aug 5, 2025 at 3:39 AM Alexandre Dutra <adu...@apache.org> wrote:

> Hi all,
>
> Trying to summarize the opinions we've got so far:
>
> 1) On the server side:
>
> - Decouple schema setup and bootstrap
> - Schema setup should be off by default (opt-in)
> - Schema setup should use "defaults"
> - Schema setup is rather for tests or evaluation (=> production
> readiness warning?)
>
> 2) On the tool side:
>
> - Schema setup should not be a first-class citizen but part of the
> bootstrap process
> - Schema setup could expose more options to fine-tune the process (but
> read my concern below)
>
> 3) Schemas per realm
>
> - While Polaris today has only one schema for all realms, we should
> leave the door open for schema-per-realm.
>
> I must say I'm a bit on the fence about the number of options to
> expose, and in particular I am not sure we should allow partial schema
> creation. While I understand the idea is appealing, the devil is in
> the details. Schema upgrades are a complex topic (there are tools for
> that), and we don't want to be accused of data corruption because the
> schema setup process wrongly inferred the schema upgrade steps to
> perform. Also, just creating missing tables is generally not enough:
> you need to also populate them and preserve data integrity. I'd prefer
> that we start with something simple (the schema either exists or
> doesn't exist at all).
>
> Can we maybe start to transform the above ideas into actionable items?
>
> Thanks,
> Alex
>
>
> On Mon, Aug 4, 2025 at 10:58 PM Yufei Gu <flyrain...@gmail.com> wrote:
> >
> > > In practice though it is possible that different realms get
> bootstrapped
> > with different database schemas and I think we should try to make the
> > service resilient to situations like this
> >
> > +1 on this. Each realm could have its own schema, or even different
> > Postgres server. A bit more context on it, we were trying to implement
> this
> > while doing JDBC implementation. The only reason we didn't do that is the
> > Quarkus configuration limits, it can only support one data source.
> >
> > Yufei
> >
> >
> > On Fri, Aug 1, 2025 at 10:56 PM Eric Maynard <eric.w.mayn...@gmail.com>
> > wrote:
> >
> > > +1 to everything Dmitri said, well put.
> > >
> > > Ideally, I would say that "schema" ought not to be a first-class
> concept
> > > within Polaris administration, just whether a realm has been
> bootstrapped
> > > or not. In practice though it is possible that different realms get
> > > bootstrapped with different database schemas and I think we should try
> to
> > > make the service resilient to situations like this.
> > >
> > > As far as the PR itself, I agree that giving admins the *option* to
> refine
> > > bootstrapping down into substeps seems useful. Beyond separating the
> schema
> > > creation as the PR proposes, one could imagine being able to just
> re-create
> > > a certain table or something like that. Maybe there could be options,
> such
> > > as whether to include some particular table (e.g. they aren't using
> events,
> > > so don't make an events table). However, I agree with Dmitri that the
> > > default should just be a plain & complete bootstrap.
> > >
> > > --EM
> > >
> > > On Sat, Aug 2, 2025 at 8:51 AM Dmitri Bourlatchkov <di...@apache.org>
> > > wrote:
> > >
> > > > Hi Alex,
> > > >
> > > > Thanks for starting this thread! My opinions below.
> > > >
> > > > > 1) Should schema setup be separate from realm bootstrapping?
> > > >
> > > > I think it should be possible to execute the two actions separately.
> > > Yet, I
> > > > do not mind running them together by default.
> > > >
> > > > > 2) Should the server perform schema setup at all?
> > > >
> > > > Only if the server is configured to bootstrap automatically (which
> should
> > > > be off by default for any real persistence).
> > > >
> > > > > 3) different schemas per realm?
> > > >
> > > > I cannot imagine how this might work... Do you mean different
> schemas in
> > > > different realms? That feels like a super narrow use case to me,
> > > although I
> > > > do not mind supporting it.
> > > >
> > > > > 4) options [...]
> > > >
> > > > I believe the primary schema setup case (with many options, etc.)
> should
> > > be
> > > > through the admin tool.
> > > >
> > > > If the server bootstraps automatically, it should use only defaults
> for
> > > the
> > > > schema setup. I consider this case only as a "test" or "getting
> started"
> > > > use case (i.e. not "production").
> > > >
> > > > Cheers,
> > > > Dmitri.
> > > >
> > > >
> > > >
> > > > On Fri, Aug 1, 2025 at 6:43 AM Alexandre Dutra <adu...@apache.org>
> > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > A nice recent contribution [1], still under review, proposes to
> create
> > > > > a separate admin tool command for setting up the database schema.
> > > > >
> > > > > Currently, schema setup is done as part of realm bootstrapping [2],
> > > > > which can happen at server bootstrap, when running the tool's
> > > > > bootstrap command, or on-demand, depending on the
> RealmContextResolver
> > > > > in use. For this reason, the setup script must be idempotent.
> > > > >
> > > > > But the PR raised several questions. I will try to summarize them
> here:
> > > > >
> > > > > 1) Should schema setup be separate from realm bootstrapping?
> > > > >
> > > > > 2) Should the server perform schema setup at all? If so, when? At
> > > > > startup, or when a new realm is resolved?
> > > > >   - Side question: if the server doesn't perform schema setup, how
> are
> > > > > tests going to do it?
> > > > >
> > > > > 3) Do we want/need to support – or at least leave the door open
> for –
> > > > > different schemas per realm?
> > > > >
> > > > > 4) If we introduce an option to control how schema setup is done:
> > > > >   - Should it be a configuration option in application.properties?
> It
> > > > > would then be available to both server and tool.
> > > > >   - Should it be an option of the admin tool's `bootstrap` command?
> > > > > Opt-in or opt-out?
> > > > >   - Should it be an admin tool's separate command like `setup` ?
> > > > >
> > > > > For 4) I would be in favor of a new configuration option, e.g.:
> > > > >
> > > > > polaris.schema.setup-mode=NEVER|STARTUP|PER_REALM
> > > > >
> > > > > It would be accessible to both server and tool, and the default
> could
> > > > > be NEVER for production, and AT_STARTUP for tests. PER_REALM could
> be
> > > > > introduced later.
> > > > >
> > > > > I'm curious to see what others think.
> > > > >
> > > > > Thanks,
> > > > > Alex
> > > > >
> > > > > [1] https://github.com/apache/polaris/pull/2196
> > > > > [2]
> > > > >
> > > >
> > >
> https://github.com/apache/polaris/blob/2117dbd08e8352b32a2c948ed6c166d7c77da50a/persistence/relational-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/JdbcMetaStoreManagerFactory.java#L148-L158
> > > > >
> > > >
> > >
>

Reply via email to