To clarify, Polaris *does* support multi-tenancy. What’s currently limited
with Option 1 is *account-level* multi-tenancy specifically in the context
of EclipseLink.

   1. *Multi-account support* is most relevant when a vendor wants to
   commercialize Polaris and offer it as a service to customers. It won't be a
   popular choice across all Apache Polaris adoption. I have never
   heard anyone asking for it in the Polaris community.
   2. *Promoting realmId for multi-account usage* may make sense for NoSQL
   backends, which are typically distributed and scale well. I believe this is
   what Snowflake and Dremio do with managed Polaris.
   However, this thread is focused on the *Postgres JDBC implementation*,
   which often runs on a single-node setup. In that case, Option 3 could
   introduce performance bottlenecks.
   3. *Option 1 provides stronger isolation*. A bug in the persistence
   layer or a misconfiguration at the RDBMS level could more easily cause
   cross-account data leakage without it. Remember that a realm may represent
   environments like dev, QA, or prod,  isolation matters. (ref
   
<https://github.com/polaris-catalog/polaris/blob/main/polaris-core/src/main/java/org/apache/polaris/core/context/RealmContext.java#L23-L23>).
   Plus, as Pierre said, a noisy neighbor could impact across-account
   performance, and keep in mind, it’s a single node database.
   4. A solution that mixes Option 1 and Option 3 would be a *breaking
   change*. It would require not only schema updates but also modifications
   to admin tooling like the bootstrap logic. If there's strong interest, I’d
   suggest pursuing that as a *separate proposal*.

Given all this, I don’t think the realmId schema change should block the
JDBC implementation work.


Yufei


On Tue, Apr 22, 2025 at 5:57 AM Alex Dutra <alex.du...@dremio.com.invalid>
wrote:

> Hi all,
>
> I also would like to reiterate that Quarkus has no particular support for
> multi-tenancy with options 1 or 2: unless there are only a handful of
> datasources to use, and they can all be fixed at build time (which I think
> is not the case here), we'd need to manage the datasources and their
> connection pools ourselves. I hope we are all aware of that and OK with it.
>
> Thanks,
>
> Alex
>
> On Mon, Apr 21, 2025 at 11:06 PM Dmitri Bourlatchkov <di...@apache.org>
> wrote:
>
> > My point is that if we do not include realm ID in the Primary Key (option
> > 1), then we're effectively forcing all users to deploy Polaris with a
> > DataSource per Realm approach. I do not see how we can decouple this
> > concern from the JDBC schema. Any subsequent schema changes will
> complicate
> > upgrades.
> >
> > My personal opinion is that we do not have to force users this way (and
> > offer deployment flexibility as discussed previously).
> >
> > I do not really see any operational ambiguity in option 3. Administrators
> > have to define a DataSource anyway. Diligent Administrators have to
> > understand the JDBC schema anyway. If the config defaults are such that
> > reusing a DataSource for many realms is _not_ allowed, an Administrator
> > cannot mix data by mistake.
> >
> > Also, I believe the extra code complexity is negligible to the complexity
> > of ensuring correct operation during concurrent updates.
> >
> > While it is not my intention to block going with option 1 only, I believe
> > we have to make project decisions with clarity, therefore I raise this
> > point (again) and ask people to acknowledge that this is indeed the
> > direction we want to go.
> >
> > Thanks,
> > Dmitri.
> >
> > On Mon, Apr 21, 2025 at 1:22 PM Prashant Singh
> > <prashant.si...@snowflake.com.invalid> wrote:
> >
> > > Hey All,
> > >
> > > Based on our recent discussion and the PR feedback, it seems like we
> need
> > > more in-depth conversations to align on the best path forward.
> > >
> > > Considering this, I'd like to propose we decouple this particular
> feature
> > > from the current JDBC implementation.
> > >
> > > My reasoning for this suggestion is as follows:
> > >
> > >    1. Following the precedent set by EclipseLink, the initial goal of
> the
> > >    JDBC implementation was to *replace* EclipseLink. This new feature
> > feels
> > >    like an addition to that core effort.
> > >    2. We anticipate revisiting schema changes when we discuss a
> separate
> > >    DAO for the Entity layer. This means the schema we're currently
> > > considering
> > >    isn't necessarily final.
> > >    3. Many users are eagerly awaiting the JDBC implementation due to
> the
> > >    scalability limitations of the current EclipseLink solution.
> > Decoupling
> > >    this might allow us to deliver the core JDBC benefits sooner.
> > >
> > > I'd love to hear your thoughts on this proposal.
> > >
> > > Best, Prashant
> > >
> > >
> > > On Fri, Apr 18, 2025 at 3:57 PM Yufei Gu <flyrain...@gmail.com> wrote:
> > >
> > > > Thanks for the thoughtful input.
> > > >
> > > > While it's true that some environments may not require strict
> > separation
> > > > between realms, the risk of incorrect usage or subtle cross-realm
> > > > interference is significantly higher if we allow shared databases
> > without
> > > > enforcing strong boundaries.
> > > >
> > > > Option 1 gives us strong, predictable isolation with minimal
> complexity
> > > and
> > > > fewer edge cases. Yes, if multiple realms are mixed in the same JVM
> > even
> > > > with option 1, isolation may still be compromised, but at least the
> > > design
> > > > makes this explicit and easier to reason about. Running one realm per
> > > > Polaris instance is a reasonable solution for environments that value
> > > > isolation, and option 1 just works, while option 3 adds unnecessary
> > > > complexity.
> > > >
> > > > I believe adding support for both option 1 and option 3 introduces
> not
> > > just
> > > > code complexity, but also operational ambiguity and a burden on users
> > to
> > > > fully understand the trade-offs. Instead of delegating this to
> admins,
> > we
> > > > should first aim for clarity and safety in the design.
> > > >
> > > > We can always revisit this in the future if a strong real-world use
> > case
> > > > arises. For now, I’d prefer we keep the design simple and
> unambiguous.
> > > >
> > > > Yufei
> > > >
> > > >
> > > > On Fri, Apr 18, 2025 at 3:17 PM Dmitri Bourlatchkov <
> di...@apache.org>
> > > > wrote:
> > > >
> > > > > I believe users of Apache Polaris may want to share the database
> > across
> > > > > many realms in environments that do not need secure separation of
> > > realms.
> > > > > This is hypothetical, at this point, of course. However, If option
> 3
> > is
> > > > not
> > > > > supported by code that use case will be impossible (or require
> > > subsequent
> > > > > changes and releases).
> > > > >
> > > > > Even with option 1 if multiple realms are mixed in memory, the
> > > isolation
> > > > > guarantees are not much stronger than with option 3. If the main
> > > concern
> > > > is
> > > > > strong isolation, then Polaris Servers should run with only one
> realm
> > > per
> > > > > instance (per JVM).
> > > > >
> > > > > I propose to delegate this decision to the Polaris admin.
> > > > >
> > > > > I do not think the code will have to be more complex to support
> both
> > > > > options 1 and 3 compared to option 1 alone. In fact, as far as I
> can
> > > > tell,
> > > > > supporting option 1 plus multiple realms per JVM is more complex
> than
> > > > > option 3 alone.
> > > > >
> > > > > Cheers,
> > > > > Dmitri.
> > > > >
> > > > >
> > > > > On Fri, Apr 18, 2025 at 4:38 PM Yufei Gu <flyrain...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi Folks,
> > > > > >
> > > > > > As we discussed, option 1 provides the strongest isolation, which
> > > > should
> > > > > > work particularly well for dynamically created data sources.
> > Another
> > > > > > significant benefit is that it's less complicated overall.
> > > > > >
> > > > > > I'm not convinced we need both option 1 and option 3. For
> scenarios
> > > > > > involving only a single realm, the concept of a realm becomes
> > > > > unnecessary.
> > > > > > In that case, there's no need for any additional options,
> including
> > > > > option
> > > > > > 3.
> > > > > >
> > > > > > Yufei
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 15, 2025 at 11:19 AM Dmitri Bourlatchkov <
> > > di...@apache.org
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Going with options 1 and 3 initially sounds good to me. This
> > should
> > > > > > > simplify current JDBC PRs too.
> > > > > > >
> > > > > > > We can certainly add capabilities later, because having realm
> ID
> > in
> > > > the
> > > > > > PR
> > > > > > > does not preclude other deployment choices.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Dmitri.
> > > > > > >
> > > > > > > On Tue, Apr 15, 2025 at 1:49 PM Michael Collado <
> > > > > collado.m...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > My $.02 is that Option 1 is entirely possible using a
> > DataSource
> > > > that
> > > > > > > > dynamically creates Connections as needed. Option 1 is nice
> > > > because,
> > > > > as
> > > > > > > > Pierre said, it gives admins the ability to dynamically
> > allocate
> > > > > > > resources
> > > > > > > > to different clients as needed.
> > > > > > > >
> > > > > > > > Personally, I'm less inclined to option 3 just because it
> means
> > > > > > > potentially
> > > > > > > > larger blast radius if database credentials are ever leaked.
> > But
> > > if
> > > > > > most
> > > > > > > > end users are expecting to only manage a single realm, it's
> > > > probably
> > > > > > the
> > > > > > > > easiest and solves the most common use case.
> > > > > > > >
> > > > > > > > I like the option of combining 1 and 3 - by default, a single
> > > > tenant
> > > > > > > > deployment writes to a single end database, but admins have
> the
> > > > > ability
> > > > > > > to
> > > > > > > > configure dynamic connections to different database endpoints
> > if
> > > > > > multiple
> > > > > > > > realms are supported.
> > > > > > > >
> > > > > > > > Mike
> > > > > > > >
> > > > > > > > On Tue, Apr 15, 2025 at 9:32 AM Alex Dutra
> > > > > > <alex.du...@dremio.com.invalid
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I'm in agreement with Pierre, JB and Dmitri's points. I’d
> > like
> > > to
> > > > > add
> > > > > > > > some
> > > > > > > > > context from the Quarkus configuration angle:
> > > > > > > > >
> > > > > > > > > Option 1, which involves distinct datasources, presents a
> > > > > challenge.
> > > > > > > > > Quarkus requires all datasources to be present and fully
> > > > configured
> > > > > > at
> > > > > > > > > build time. This requirement could be quite cumbersome for
> > end
> > > > > users,
> > > > > > > > > making this option less user-friendly in practice.
> > > > > > > > >
> > > > > > > > > Regarding Option 2, while it's theoretically possible to
> > manage
> > > > > > > multiple
> > > > > > > > > schemas with a single datasource, implementing this can be
> > > > complex.
> > > > > > To
> > > > > > > > > effectively work with different schemas in PostgreSQL, you
> > > would
> > > > > need
> > > > > > > to
> > > > > > > > > either qualify all table identifiers or adjust the
> > > `search_path`
> > > > > URL
> > > > > > > > > parameter. Additionally, other JDBC backends like MySQL
> don't
> > > > > support
> > > > > > > > > multiple schemas per database, which would make Option 2
> less
> > > > > > portable
> > > > > > > > > across different JDBC databases.
> > > > > > > > >
> > > > > > > > > That's why I think Option 3 is the most portable one, and
> the
> > > > > easiest
> > > > > > > for
> > > > > > > > > users or administrators to configure. As Pierre noted, it
> is
> > > > > subject
> > > > > > to
> > > > > > > > > noisy neighbor interferences – but to some extent, I think
> > > > > > > interferences
> > > > > > > > > could also happen with separate schemas like in option 2.
> > > > > > > > >
> > > > > > > > > Just my 2 cents.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Alex
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Apr 15, 2025 at 4:00 PM Dmitri Bourlatchkov <
> > > > > > di...@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for your perspective, Pierre! You make good points
> > > and I
> > > > > > agree
> > > > > > > > > with
> > > > > > > > > > them.
> > > > > > > > > >
> > > > > > > > > > From my POV, I'd add that we probably need to take
> > deployment
> > > > > > > concerns
> > > > > > > > > into
> > > > > > > > > > account too.
> > > > > > > > > >
> > > > > > > > > > If the deployment uses the database per realm approach
> > > (option
> > > > 1)
> > > > > > > then
> > > > > > > > > > someone has to provide database connection parameters
> > > > (including
> > > > > > > > > secrets).
> > > > > > > > > > If that is the deployment administrator, then the admin
> > > > > necessarily
> > > > > > > has
> > > > > > > > > to
> > > > > > > > > > be aware of all realms and effectively has control of the
> > > data
> > > > in
> > > > > > all
> > > > > > > > > > realms. Isolation is achieved only for end users.
> > > > > > > > > >
> > > > > > > > > > That said, even with option 3 the deployment owner has
> > > control
> > > > > over
> > > > > > > all
> > > > > > > > > > realms and end users are isolated as far as their access
> to
> > > > APIs
> > > > > is
> > > > > > > > > > concerned. End users cannot discover each other's data
> > > (barring
> > > > > > > coding
> > > > > > > > > > mistakes in Polaris). The same goes for option 2 as it's
> > the
> > > > > middle
> > > > > > > > > ground.
> > > > > > > > > >
> > > > > > > > > > I do not see any material difference between options 1, 2
> > > and 3
> > > > > > from
> > > > > > > > the
> > > > > > > > > > end user's perspective.
> > > > > > > > > >
> > > > > > > > > > If, however, the database connection parameters are not
> > > > > controlled
> > > > > > by
> > > > > > > > the
> > > > > > > > > > administrator, but by the end user who wants to define a
> > > realm,
> > > > > > then
> > > > > > > > > > Polaris needs to expose managing database connections and
> > > > > secrets.
> > > > > > > This
> > > > > > > > > may
> > > > > > > > > > be a valuable feature, but I believe it is far beyond
> > current
> > > > > > Polaris
> > > > > > > > > > backend capabilities. I do not think going this way is
> > > > justified
> > > > > at
> > > > > > > > this
> > > > > > > > > > time.
> > > > > > > > > >
> > > > > > > > > > I'd like to propose a hybrid approach where Polaris
> > provides
> > > > > > > > capabilities
> > > > > > > > > > (and config) for the administrators to choose between
> > options
> > > > 1,
> > > > > > 2, 3
> > > > > > > > > > according to their specific deployment concerns.
> > > > > > > > > >
> > > > > > > > > > This means that the primary key has to include the realm
> > ID,
> > > > > > because
> > > > > > > if
> > > > > > > > > the
> > > > > > > > > > Polaris code does not provide it then the admin will not
> be
> > > > able
> > > > > to
> > > > > > > > > choose
> > > > > > > > > > option 3 at runtime.
> > > > > > > > > >
> > > > > > > > > > WDYT?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Dmitri.
> > > > > > > > > >
> > > > > > > > > > On Tue, Apr 15, 2025 at 8:35 AM Pierre Laporte <
> > > > > > > pie...@pingtimeout.fr>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Prashant
> > > > > > > > > > >
> > > > > > > > > > > I guess the answer will depend on how easy it should be
> > for
> > > > > > Polaris
> > > > > > > > to
> > > > > > > > > > > support multi-tenancy.
> > > > > > > > > > >
> > > > > > > > > > > A separate database per realm would allow
> administrators
> > to
> > > > > limit
> > > > > > > the
> > > > > > > > > > > amount of resources that a realm can consume (e.g. the
> > > > maximum
> > > > > > > number
> > > > > > > > > of
> > > > > > > > > > > database connections).  Indeed, it would be one of the
> > > > > strongest
> > > > > > > > > > isolation
> > > > > > > > > > > mode.  However, the code would need to support a
> complete
> > > > > > database
> > > > > > > > > > > configuration per realm (think username and password
> and
> > > > > possibly
> > > > > > > IP
> > > > > > > > > > > address) if the goal is to match Postgres capabilities.
> > In
> > > > > terms
> > > > > > > of
> > > > > > > > > > > backup/restore, it is the most flexible option.
> > > > > > > > > > >
> > > > > > > > > > > A "one schema per realm" approach would be a simpler
> > > > approach,
> > > > > > > > > regarding
> > > > > > > > > > > datasource configuration.  However, there would be less
> > > > > isolation
> > > > > > > > > between
> > > > > > > > > > > realms, and a resource utilization spike on one realm
> > could
> > > > > > impact
> > > > > > > > > > > performance of another realm.  It is as flexible as
> > option
> > > #1
> > > > > > > > regarding
> > > > > > > > > > > backup and restore.
> > > > > > > > > > >
> > > > > > > > > > > A "realm as part of the primary key" approach is the
> most
> > > > > > efficient
> > > > > > > > > way,
> > > > > > > > > > in
> > > > > > > > > > > that the cost of adding tenants is close to zero.  Like
> > in
> > > > > option
> > > > > > > #2,
> > > > > > > > > > there
> > > > > > > > > > > is no real resource isolation between tenants and a
> > > > > > noisy-neighbor
> > > > > > > > > > > situation is a possible issue.  The biggest difference
> is
> > > > > > regarding
> > > > > > > > > > backup
> > > > > > > > > > > and restore.  Consider the case where data is
> > accidentally
> > > > > > > > > > > wiped/corrupted/modified/... in a given tenant and
> > > > > administrators
> > > > > > > > want
> > > > > > > > > to
> > > > > > > > > > > restore it to a previous state.  With this approach, it
> > is
> > > a
> > > > > much
> > > > > > > > more
> > > > > > > > > > > complex as Postgres does not (AFAIK) allow the
> > possibility
> > > to
> > > > > > > restore
> > > > > > > > > > > tables partially.
> > > > > > > > > > >
> > > > > > > > > > > Just my 2 cents
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > >
> > > > > > > > > > > Pierre
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Apr 15, 2025 at 12:42 AM Prashant Singh
> > > > > > > > > > > <prashant.si...@snowflake.com.invalid> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Dear Polaris Community,
> > > > > > > > > > > >
> > > > > > > > > > > > This email initiates a discussion regarding the
> > modeling
> > > of
> > > > > > > Realms
> > > > > > > > > > within
> > > > > > > > > > > > the Polaris project, following its recent mention in
> my
> > > > JDBC
> > > > > > > > > > > implementation
> > > > > > > > > > > > pull request:
> > > > > > > > > > > >
> > > > > https://github.com/apache/polaris/pull/1287/files#r2040383971.
> > > > > > > > > > > >
> > > > > > > > > > > > My current understanding, based on available
> > information,
> > > > is
> > > > > > that
> > > > > > > > > > Realms
> > > > > > > > > > > > were primarily intended for isolation. Consequently,
> > the
> > > > > > > > EclipseLink
> > > > > > > > > > > > implementation treats each Realm as a separate
> > database.
> > > > > > > > > > > >
> > > > > > > > > > > > As we are re-implementing this functionality, it was
> > > > > suggested
> > > > > > > that
> > > > > > > > > we
> > > > > > > > > > > > gather community feedback on the optimal approach to
> > > > modeling
> > > > > > > > Realms.
> > > > > > > > > > > >
> > > > > > > > > > > > Based on my current understanding, here are potential
> > > > > modeling
> > > > > > > > > options:
> > > > > > > > > > > >
> > > > > > > > > > > > *1. Separate Databases per Realm:*
> > > > > > > > > > > >
> > > > > > > > > > > >    - Each Realm would correspond to a distinct
> > database.
> > > > > > > > > > > >    - This could be implemented using Quarkus custom
> > data
> > > > > > sources,
> > > > > > > > > with
> > > > > > > > > > > one
> > > > > > > > > > > >    data source per Realm.
> > > > > > > > > > > >
> > > > > > > > > > > > *2. Separate Schemas per Realm:*
> > > > > > > > > > > >
> > > > > > > > > > > >    - Each Realm would correspond to a distinct
> database
> > > > > schema
> > > > > > > > > within a
> > > > > > > > > > > >    single database.
> > > > > > > > > > > >    - Most database systems support two-part
> > identifiers (
> > > > > > > > > > > >    <schema_name>.<table_name>), allowing for data
> > > > isolation.
> > > > > > > > > > > >
> > > > > > > > > > > > *3. Realm as a Primary Key:*
> > > > > > > > > > > >
> > > > > > > > > > > >    - A realm identifier would be added as a primary
> key
> > > (or
> > > > > > part
> > > > > > > > of a
> > > > > > > > > > > >    composite primary key) to each Polaris table.
> > > > > > > > > > > >    - Data isolation would be enforced through
> filtering
> > > > based
> > > > > > on
> > > > > > > > this
> > > > > > > > > > key
> > > > > > > > > > > >    during data access.
> > > > > > > > > > > >
> > > > > > > > > > > > The optimal approach will likely depend on ease of
> use
> > > and
> > > > > > > > > > > maintainability
> > > > > > > > > > > > for database administrators.
> > > > > > > > > > > >
> > > > > > > > > > > > Please share your thoughts and preferences regarding
> > > these
> > > > > > > options.
> > > > > > > > > > > >
> > > > > > > > > > > > Best regards,
> > > > > > > > > > > >
> > > > > > > > > > > > Prashant Singh
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to