Hi all,

I also would like to reiterate that Quarkus has no particular support for
multi-tenancy with options 1 or 2: unless there are only a handful of
datasources to use, and they can all be fixed at build time (which I think
is not the case here), we'd need to manage the datasources and their
connection pools ourselves. I hope we are all aware of that and OK with it.

Thanks,

Alex

On Mon, Apr 21, 2025 at 11:06 PM Dmitri Bourlatchkov <di...@apache.org>
wrote:

> My point is that if we do not include realm ID in the Primary Key (option
> 1), then we're effectively forcing all users to deploy Polaris with a
> DataSource per Realm approach. I do not see how we can decouple this
> concern from the JDBC schema. Any subsequent schema changes will complicate
> upgrades.
>
> My personal opinion is that we do not have to force users this way (and
> offer deployment flexibility as discussed previously).
>
> I do not really see any operational ambiguity in option 3. Administrators
> have to define a DataSource anyway. Diligent Administrators have to
> understand the JDBC schema anyway. If the config defaults are such that
> reusing a DataSource for many realms is _not_ allowed, an Administrator
> cannot mix data by mistake.
>
> Also, I believe the extra code complexity is negligible to the complexity
> of ensuring correct operation during concurrent updates.
>
> While it is not my intention to block going with option 1 only, I believe
> we have to make project decisions with clarity, therefore I raise this
> point (again) and ask people to acknowledge that this is indeed the
> direction we want to go.
>
> Thanks,
> Dmitri.
>
> On Mon, Apr 21, 2025 at 1:22 PM Prashant Singh
> <prashant.si...@snowflake.com.invalid> wrote:
>
> > Hey All,
> >
> > Based on our recent discussion and the PR feedback, it seems like we need
> > more in-depth conversations to align on the best path forward.
> >
> > Considering this, I'd like to propose we decouple this particular feature
> > from the current JDBC implementation.
> >
> > My reasoning for this suggestion is as follows:
> >
> >    1. Following the precedent set by EclipseLink, the initial goal of the
> >    JDBC implementation was to *replace* EclipseLink. This new feature
> feels
> >    like an addition to that core effort.
> >    2. We anticipate revisiting schema changes when we discuss a separate
> >    DAO for the Entity layer. This means the schema we're currently
> > considering
> >    isn't necessarily final.
> >    3. Many users are eagerly awaiting the JDBC implementation due to the
> >    scalability limitations of the current EclipseLink solution.
> Decoupling
> >    this might allow us to deliver the core JDBC benefits sooner.
> >
> > I'd love to hear your thoughts on this proposal.
> >
> > Best, Prashant
> >
> >
> > On Fri, Apr 18, 2025 at 3:57 PM Yufei Gu <flyrain...@gmail.com> wrote:
> >
> > > Thanks for the thoughtful input.
> > >
> > > While it's true that some environments may not require strict
> separation
> > > between realms, the risk of incorrect usage or subtle cross-realm
> > > interference is significantly higher if we allow shared databases
> without
> > > enforcing strong boundaries.
> > >
> > > Option 1 gives us strong, predictable isolation with minimal complexity
> > and
> > > fewer edge cases. Yes, if multiple realms are mixed in the same JVM
> even
> > > with option 1, isolation may still be compromised, but at least the
> > design
> > > makes this explicit and easier to reason about. Running one realm per
> > > Polaris instance is a reasonable solution for environments that value
> > > isolation, and option 1 just works, while option 3 adds unnecessary
> > > complexity.
> > >
> > > I believe adding support for both option 1 and option 3 introduces not
> > just
> > > code complexity, but also operational ambiguity and a burden on users
> to
> > > fully understand the trade-offs. Instead of delegating this to admins,
> we
> > > should first aim for clarity and safety in the design.
> > >
> > > We can always revisit this in the future if a strong real-world use
> case
> > > arises. For now, I’d prefer we keep the design simple and unambiguous.
> > >
> > > Yufei
> > >
> > >
> > > On Fri, Apr 18, 2025 at 3:17 PM Dmitri Bourlatchkov <di...@apache.org>
> > > wrote:
> > >
> > > > I believe users of Apache Polaris may want to share the database
> across
> > > > many realms in environments that do not need secure separation of
> > realms.
> > > > This is hypothetical, at this point, of course. However, If option 3
> is
> > > not
> > > > supported by code that use case will be impossible (or require
> > subsequent
> > > > changes and releases).
> > > >
> > > > Even with option 1 if multiple realms are mixed in memory, the
> > isolation
> > > > guarantees are not much stronger than with option 3. If the main
> > concern
> > > is
> > > > strong isolation, then Polaris Servers should run with only one realm
> > per
> > > > instance (per JVM).
> > > >
> > > > I propose to delegate this decision to the Polaris admin.
> > > >
> > > > I do not think the code will have to be more complex to support both
> > > > options 1 and 3 compared to option 1 alone. In fact, as far as I can
> > > tell,
> > > > supporting option 1 plus multiple realms per JVM is more complex than
> > > > option 3 alone.
> > > >
> > > > Cheers,
> > > > Dmitri.
> > > >
> > > >
> > > > On Fri, Apr 18, 2025 at 4:38 PM Yufei Gu <flyrain...@gmail.com>
> wrote:
> > > >
> > > > > Hi Folks,
> > > > >
> > > > > As we discussed, option 1 provides the strongest isolation, which
> > > should
> > > > > work particularly well for dynamically created data sources.
> Another
> > > > > significant benefit is that it's less complicated overall.
> > > > >
> > > > > I'm not convinced we need both option 1 and option 3. For scenarios
> > > > > involving only a single realm, the concept of a realm becomes
> > > > unnecessary.
> > > > > In that case, there's no need for any additional options, including
> > > > option
> > > > > 3.
> > > > >
> > > > > Yufei
> > > > >
> > > > >
> > > > > On Tue, Apr 15, 2025 at 11:19 AM Dmitri Bourlatchkov <
> > di...@apache.org
> > > >
> > > > > wrote:
> > > > >
> > > > > > Going with options 1 and 3 initially sounds good to me. This
> should
> > > > > > simplify current JDBC PRs too.
> > > > > >
> > > > > > We can certainly add capabilities later, because having realm ID
> in
> > > the
> > > > > PR
> > > > > > does not preclude other deployment choices.
> > > > > >
> > > > > > Cheers,
> > > > > > Dmitri.
> > > > > >
> > > > > > On Tue, Apr 15, 2025 at 1:49 PM Michael Collado <
> > > > collado.m...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > My $.02 is that Option 1 is entirely possible using a
> DataSource
> > > that
> > > > > > > dynamically creates Connections as needed. Option 1 is nice
> > > because,
> > > > as
> > > > > > > Pierre said, it gives admins the ability to dynamically
> allocate
> > > > > > resources
> > > > > > > to different clients as needed.
> > > > > > >
> > > > > > > Personally, I'm less inclined to option 3 just because it means
> > > > > > potentially
> > > > > > > larger blast radius if database credentials are ever leaked.
> But
> > if
> > > > > most
> > > > > > > end users are expecting to only manage a single realm, it's
> > > probably
> > > > > the
> > > > > > > easiest and solves the most common use case.
> > > > > > >
> > > > > > > I like the option of combining 1 and 3 - by default, a single
> > > tenant
> > > > > > > deployment writes to a single end database, but admins have the
> > > > ability
> > > > > > to
> > > > > > > configure dynamic connections to different database endpoints
> if
> > > > > multiple
> > > > > > > realms are supported.
> > > > > > >
> > > > > > > Mike
> > > > > > >
> > > > > > > On Tue, Apr 15, 2025 at 9:32 AM Alex Dutra
> > > > > <alex.du...@dremio.com.invalid
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I'm in agreement with Pierre, JB and Dmitri's points. I’d
> like
> > to
> > > > add
> > > > > > > some
> > > > > > > > context from the Quarkus configuration angle:
> > > > > > > >
> > > > > > > > Option 1, which involves distinct datasources, presents a
> > > > challenge.
> > > > > > > > Quarkus requires all datasources to be present and fully
> > > configured
> > > > > at
> > > > > > > > build time. This requirement could be quite cumbersome for
> end
> > > > users,
> > > > > > > > making this option less user-friendly in practice.
> > > > > > > >
> > > > > > > > Regarding Option 2, while it's theoretically possible to
> manage
> > > > > > multiple
> > > > > > > > schemas with a single datasource, implementing this can be
> > > complex.
> > > > > To
> > > > > > > > effectively work with different schemas in PostgreSQL, you
> > would
> > > > need
> > > > > > to
> > > > > > > > either qualify all table identifiers or adjust the
> > `search_path`
> > > > URL
> > > > > > > > parameter. Additionally, other JDBC backends like MySQL don't
> > > > support
> > > > > > > > multiple schemas per database, which would make Option 2 less
> > > > > portable
> > > > > > > > across different JDBC databases.
> > > > > > > >
> > > > > > > > That's why I think Option 3 is the most portable one, and the
> > > > easiest
> > > > > > for
> > > > > > > > users or administrators to configure. As Pierre noted, it is
> > > > subject
> > > > > to
> > > > > > > > noisy neighbor interferences – but to some extent, I think
> > > > > > interferences
> > > > > > > > could also happen with separate schemas like in option 2.
> > > > > > > >
> > > > > > > > Just my 2 cents.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Alex
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Apr 15, 2025 at 4:00 PM Dmitri Bourlatchkov <
> > > > > di...@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for your perspective, Pierre! You make good points
> > and I
> > > > > agree
> > > > > > > > with
> > > > > > > > > them.
> > > > > > > > >
> > > > > > > > > From my POV, I'd add that we probably need to take
> deployment
> > > > > > concerns
> > > > > > > > into
> > > > > > > > > account too.
> > > > > > > > >
> > > > > > > > > If the deployment uses the database per realm approach
> > (option
> > > 1)
> > > > > > then
> > > > > > > > > someone has to provide database connection parameters
> > > (including
> > > > > > > > secrets).
> > > > > > > > > If that is the deployment administrator, then the admin
> > > > necessarily
> > > > > > has
> > > > > > > > to
> > > > > > > > > be aware of all realms and effectively has control of the
> > data
> > > in
> > > > > all
> > > > > > > > > realms. Isolation is achieved only for end users.
> > > > > > > > >
> > > > > > > > > That said, even with option 3 the deployment owner has
> > control
> > > > over
> > > > > > all
> > > > > > > > > realms and end users are isolated as far as their access to
> > > APIs
> > > > is
> > > > > > > > > concerned. End users cannot discover each other's data
> > (barring
> > > > > > coding
> > > > > > > > > mistakes in Polaris). The same goes for option 2 as it's
> the
> > > > middle
> > > > > > > > ground.
> > > > > > > > >
> > > > > > > > > I do not see any material difference between options 1, 2
> > and 3
> > > > > from
> > > > > > > the
> > > > > > > > > end user's perspective.
> > > > > > > > >
> > > > > > > > > If, however, the database connection parameters are not
> > > > controlled
> > > > > by
> > > > > > > the
> > > > > > > > > administrator, but by the end user who wants to define a
> > realm,
> > > > > then
> > > > > > > > > Polaris needs to expose managing database connections and
> > > > secrets.
> > > > > > This
> > > > > > > > may
> > > > > > > > > be a valuable feature, but I believe it is far beyond
> current
> > > > > Polaris
> > > > > > > > > backend capabilities. I do not think going this way is
> > > justified
> > > > at
> > > > > > > this
> > > > > > > > > time.
> > > > > > > > >
> > > > > > > > > I'd like to propose a hybrid approach where Polaris
> provides
> > > > > > > capabilities
> > > > > > > > > (and config) for the administrators to choose between
> options
> > > 1,
> > > > > 2, 3
> > > > > > > > > according to their specific deployment concerns.
> > > > > > > > >
> > > > > > > > > This means that the primary key has to include the realm
> ID,
> > > > > because
> > > > > > if
> > > > > > > > the
> > > > > > > > > Polaris code does not provide it then the admin will not be
> > > able
> > > > to
> > > > > > > > choose
> > > > > > > > > option 3 at runtime.
> > > > > > > > >
> > > > > > > > > WDYT?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Dmitri.
> > > > > > > > >
> > > > > > > > > On Tue, Apr 15, 2025 at 8:35 AM Pierre Laporte <
> > > > > > pie...@pingtimeout.fr>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Prashant
> > > > > > > > > >
> > > > > > > > > > I guess the answer will depend on how easy it should be
> for
> > > > > Polaris
> > > > > > > to
> > > > > > > > > > support multi-tenancy.
> > > > > > > > > >
> > > > > > > > > > A separate database per realm would allow administrators
> to
> > > > limit
> > > > > > the
> > > > > > > > > > amount of resources that a realm can consume (e.g. the
> > > maximum
> > > > > > number
> > > > > > > > of
> > > > > > > > > > database connections).  Indeed, it would be one of the
> > > > strongest
> > > > > > > > > isolation
> > > > > > > > > > mode.  However, the code would need to support a complete
> > > > > database
> > > > > > > > > > configuration per realm (think username and password and
> > > > possibly
> > > > > > IP
> > > > > > > > > > address) if the goal is to match Postgres capabilities.
> In
> > > > terms
> > > > > > of
> > > > > > > > > > backup/restore, it is the most flexible option.
> > > > > > > > > >
> > > > > > > > > > A "one schema per realm" approach would be a simpler
> > > approach,
> > > > > > > > regarding
> > > > > > > > > > datasource configuration.  However, there would be less
> > > > isolation
> > > > > > > > between
> > > > > > > > > > realms, and a resource utilization spike on one realm
> could
> > > > > impact
> > > > > > > > > > performance of another realm.  It is as flexible as
> option
> > #1
> > > > > > > regarding
> > > > > > > > > > backup and restore.
> > > > > > > > > >
> > > > > > > > > > A "realm as part of the primary key" approach is the most
> > > > > efficient
> > > > > > > > way,
> > > > > > > > > in
> > > > > > > > > > that the cost of adding tenants is close to zero.  Like
> in
> > > > option
> > > > > > #2,
> > > > > > > > > there
> > > > > > > > > > is no real resource isolation between tenants and a
> > > > > noisy-neighbor
> > > > > > > > > > situation is a possible issue.  The biggest difference is
> > > > > regarding
> > > > > > > > > backup
> > > > > > > > > > and restore.  Consider the case where data is
> accidentally
> > > > > > > > > > wiped/corrupted/modified/... in a given tenant and
> > > > administrators
> > > > > > > want
> > > > > > > > to
> > > > > > > > > > restore it to a previous state.  With this approach, it
> is
> > a
> > > > much
> > > > > > > more
> > > > > > > > > > complex as Postgres does not (AFAIK) allow the
> possibility
> > to
> > > > > > restore
> > > > > > > > > > tables partially.
> > > > > > > > > >
> > > > > > > > > > Just my 2 cents
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > Pierre
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Apr 15, 2025 at 12:42 AM Prashant Singh
> > > > > > > > > > <prashant.si...@snowflake.com.invalid> wrote:
> > > > > > > > > >
> > > > > > > > > > > Dear Polaris Community,
> > > > > > > > > > >
> > > > > > > > > > > This email initiates a discussion regarding the
> modeling
> > of
> > > > > > Realms
> > > > > > > > > within
> > > > > > > > > > > the Polaris project, following its recent mention in my
> > > JDBC
> > > > > > > > > > implementation
> > > > > > > > > > > pull request:
> > > > > > > > > > >
> > > > https://github.com/apache/polaris/pull/1287/files#r2040383971.
> > > > > > > > > > >
> > > > > > > > > > > My current understanding, based on available
> information,
> > > is
> > > > > that
> > > > > > > > > Realms
> > > > > > > > > > > were primarily intended for isolation. Consequently,
> the
> > > > > > > EclipseLink
> > > > > > > > > > > implementation treats each Realm as a separate
> database.
> > > > > > > > > > >
> > > > > > > > > > > As we are re-implementing this functionality, it was
> > > > suggested
> > > > > > that
> > > > > > > > we
> > > > > > > > > > > gather community feedback on the optimal approach to
> > > modeling
> > > > > > > Realms.
> > > > > > > > > > >
> > > > > > > > > > > Based on my current understanding, here are potential
> > > > modeling
> > > > > > > > options:
> > > > > > > > > > >
> > > > > > > > > > > *1. Separate Databases per Realm:*
> > > > > > > > > > >
> > > > > > > > > > >    - Each Realm would correspond to a distinct
> database.
> > > > > > > > > > >    - This could be implemented using Quarkus custom
> data
> > > > > sources,
> > > > > > > > with
> > > > > > > > > > one
> > > > > > > > > > >    data source per Realm.
> > > > > > > > > > >
> > > > > > > > > > > *2. Separate Schemas per Realm:*
> > > > > > > > > > >
> > > > > > > > > > >    - Each Realm would correspond to a distinct database
> > > > schema
> > > > > > > > within a
> > > > > > > > > > >    single database.
> > > > > > > > > > >    - Most database systems support two-part
> identifiers (
> > > > > > > > > > >    <schema_name>.<table_name>), allowing for data
> > > isolation.
> > > > > > > > > > >
> > > > > > > > > > > *3. Realm as a Primary Key:*
> > > > > > > > > > >
> > > > > > > > > > >    - A realm identifier would be added as a primary key
> > (or
> > > > > part
> > > > > > > of a
> > > > > > > > > > >    composite primary key) to each Polaris table.
> > > > > > > > > > >    - Data isolation would be enforced through filtering
> > > based
> > > > > on
> > > > > > > this
> > > > > > > > > key
> > > > > > > > > > >    during data access.
> > > > > > > > > > >
> > > > > > > > > > > The optimal approach will likely depend on ease of use
> > and
> > > > > > > > > > maintainability
> > > > > > > > > > > for database administrators.
> > > > > > > > > > >
> > > > > > > > > > > Please share your thoughts and preferences regarding
> > these
> > > > > > options.
> > > > > > > > > > >
> > > > > > > > > > > Best regards,
> > > > > > > > > > >
> > > > > > > > > > > Prashant Singh
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to