Hey All,

Based on our recent discussion and the PR feedback, it seems like we need
more in-depth conversations to align on the best path forward.

Considering this, I'd like to propose we decouple this particular feature
from the current JDBC implementation.

My reasoning for this suggestion is as follows:

   1. Following the precedent set by EclipseLink, the initial goal of the
   JDBC implementation was to *replace* EclipseLink. This new feature feels
   like an addition to that core effort.
   2. We anticipate revisiting schema changes when we discuss a separate
   DAO for the Entity layer. This means the schema we're currently considering
   isn't necessarily final.
   3. Many users are eagerly awaiting the JDBC implementation due to the
   scalability limitations of the current EclipseLink solution. Decoupling
   this might allow us to deliver the core JDBC benefits sooner.

I'd love to hear your thoughts on this proposal.

Best, Prashant


On Fri, Apr 18, 2025 at 3:57 PM Yufei Gu <flyrain...@gmail.com> wrote:

> Thanks for the thoughtful input.
>
> While it's true that some environments may not require strict separation
> between realms, the risk of incorrect usage or subtle cross-realm
> interference is significantly higher if we allow shared databases without
> enforcing strong boundaries.
>
> Option 1 gives us strong, predictable isolation with minimal complexity and
> fewer edge cases. Yes, if multiple realms are mixed in the same JVM even
> with option 1, isolation may still be compromised, but at least the design
> makes this explicit and easier to reason about. Running one realm per
> Polaris instance is a reasonable solution for environments that value
> isolation, and option 1 just works, while option 3 adds unnecessary
> complexity.
>
> I believe adding support for both option 1 and option 3 introduces not just
> code complexity, but also operational ambiguity and a burden on users to
> fully understand the trade-offs. Instead of delegating this to admins, we
> should first aim for clarity and safety in the design.
>
> We can always revisit this in the future if a strong real-world use case
> arises. For now, I’d prefer we keep the design simple and unambiguous.
>
> Yufei
>
>
> On Fri, Apr 18, 2025 at 3:17 PM Dmitri Bourlatchkov <di...@apache.org>
> wrote:
>
> > I believe users of Apache Polaris may want to share the database across
> > many realms in environments that do not need secure separation of realms.
> > This is hypothetical, at this point, of course. However, If option 3 is
> not
> > supported by code that use case will be impossible (or require subsequent
> > changes and releases).
> >
> > Even with option 1 if multiple realms are mixed in memory, the isolation
> > guarantees are not much stronger than with option 3. If the main concern
> is
> > strong isolation, then Polaris Servers should run with only one realm per
> > instance (per JVM).
> >
> > I propose to delegate this decision to the Polaris admin.
> >
> > I do not think the code will have to be more complex to support both
> > options 1 and 3 compared to option 1 alone. In fact, as far as I can
> tell,
> > supporting option 1 plus multiple realms per JVM is more complex than
> > option 3 alone.
> >
> > Cheers,
> > Dmitri.
> >
> >
> > On Fri, Apr 18, 2025 at 4:38 PM Yufei Gu <flyrain...@gmail.com> wrote:
> >
> > > Hi Folks,
> > >
> > > As we discussed, option 1 provides the strongest isolation, which
> should
> > > work particularly well for dynamically created data sources. Another
> > > significant benefit is that it's less complicated overall.
> > >
> > > I'm not convinced we need both option 1 and option 3. For scenarios
> > > involving only a single realm, the concept of a realm becomes
> > unnecessary.
> > > In that case, there's no need for any additional options, including
> > option
> > > 3.
> > >
> > > Yufei
> > >
> > >
> > > On Tue, Apr 15, 2025 at 11:19 AM Dmitri Bourlatchkov <di...@apache.org
> >
> > > wrote:
> > >
> > > > Going with options 1 and 3 initially sounds good to me. This should
> > > > simplify current JDBC PRs too.
> > > >
> > > > We can certainly add capabilities later, because having realm ID in
> the
> > > PR
> > > > does not preclude other deployment choices.
> > > >
> > > > Cheers,
> > > > Dmitri.
> > > >
> > > > On Tue, Apr 15, 2025 at 1:49 PM Michael Collado <
> > collado.m...@gmail.com>
> > > > wrote:
> > > >
> > > > > My $.02 is that Option 1 is entirely possible using a DataSource
> that
> > > > > dynamically creates Connections as needed. Option 1 is nice
> because,
> > as
> > > > > Pierre said, it gives admins the ability to dynamically allocate
> > > > resources
> > > > > to different clients as needed.
> > > > >
> > > > > Personally, I'm less inclined to option 3 just because it means
> > > > potentially
> > > > > larger blast radius if database credentials are ever leaked. But if
> > > most
> > > > > end users are expecting to only manage a single realm, it's
> probably
> > > the
> > > > > easiest and solves the most common use case.
> > > > >
> > > > > I like the option of combining 1 and 3 - by default, a single
> tenant
> > > > > deployment writes to a single end database, but admins have the
> > ability
> > > > to
> > > > > configure dynamic connections to different database endpoints if
> > > multiple
> > > > > realms are supported.
> > > > >
> > > > > Mike
> > > > >
> > > > > On Tue, Apr 15, 2025 at 9:32 AM Alex Dutra
> > > <alex.du...@dremio.com.invalid
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I'm in agreement with Pierre, JB and Dmitri's points. I’d like to
> > add
> > > > > some
> > > > > > context from the Quarkus configuration angle:
> > > > > >
> > > > > > Option 1, which involves distinct datasources, presents a
> > challenge.
> > > > > > Quarkus requires all datasources to be present and fully
> configured
> > > at
> > > > > > build time. This requirement could be quite cumbersome for end
> > users,
> > > > > > making this option less user-friendly in practice.
> > > > > >
> > > > > > Regarding Option 2, while it's theoretically possible to manage
> > > > multiple
> > > > > > schemas with a single datasource, implementing this can be
> complex.
> > > To
> > > > > > effectively work with different schemas in PostgreSQL, you would
> > need
> > > > to
> > > > > > either qualify all table identifiers or adjust the `search_path`
> > URL
> > > > > > parameter. Additionally, other JDBC backends like MySQL don't
> > support
> > > > > > multiple schemas per database, which would make Option 2 less
> > > portable
> > > > > > across different JDBC databases.
> > > > > >
> > > > > > That's why I think Option 3 is the most portable one, and the
> > easiest
> > > > for
> > > > > > users or administrators to configure. As Pierre noted, it is
> > subject
> > > to
> > > > > > noisy neighbor interferences – but to some extent, I think
> > > > interferences
> > > > > > could also happen with separate schemas like in option 2.
> > > > > >
> > > > > > Just my 2 cents.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Alex
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 15, 2025 at 4:00 PM Dmitri Bourlatchkov <
> > > di...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for your perspective, Pierre! You make good points and I
> > > agree
> > > > > > with
> > > > > > > them.
> > > > > > >
> > > > > > > From my POV, I'd add that we probably need to take deployment
> > > > concerns
> > > > > > into
> > > > > > > account too.
> > > > > > >
> > > > > > > If the deployment uses the database per realm approach (option
> 1)
> > > > then
> > > > > > > someone has to provide database connection parameters
> (including
> > > > > > secrets).
> > > > > > > If that is the deployment administrator, then the admin
> > necessarily
> > > > has
> > > > > > to
> > > > > > > be aware of all realms and effectively has control of the data
> in
> > > all
> > > > > > > realms. Isolation is achieved only for end users.
> > > > > > >
> > > > > > > That said, even with option 3 the deployment owner has control
> > over
> > > > all
> > > > > > > realms and end users are isolated as far as their access to
> APIs
> > is
> > > > > > > concerned. End users cannot discover each other's data (barring
> > > > coding
> > > > > > > mistakes in Polaris). The same goes for option 2 as it's the
> > middle
> > > > > > ground.
> > > > > > >
> > > > > > > I do not see any material difference between options 1, 2 and 3
> > > from
> > > > > the
> > > > > > > end user's perspective.
> > > > > > >
> > > > > > > If, however, the database connection parameters are not
> > controlled
> > > by
> > > > > the
> > > > > > > administrator, but by the end user who wants to define a realm,
> > > then
> > > > > > > Polaris needs to expose managing database connections and
> > secrets.
> > > > This
> > > > > > may
> > > > > > > be a valuable feature, but I believe it is far beyond current
> > > Polaris
> > > > > > > backend capabilities. I do not think going this way is
> justified
> > at
> > > > > this
> > > > > > > time.
> > > > > > >
> > > > > > > I'd like to propose a hybrid approach where Polaris provides
> > > > > capabilities
> > > > > > > (and config) for the administrators to choose between options
> 1,
> > > 2, 3
> > > > > > > according to their specific deployment concerns.
> > > > > > >
> > > > > > > This means that the primary key has to include the realm ID,
> > > because
> > > > if
> > > > > > the
> > > > > > > Polaris code does not provide it then the admin will not be
> able
> > to
> > > > > > choose
> > > > > > > option 3 at runtime.
> > > > > > >
> > > > > > > WDYT?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Dmitri.
> > > > > > >
> > > > > > > On Tue, Apr 15, 2025 at 8:35 AM Pierre Laporte <
> > > > pie...@pingtimeout.fr>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Prashant
> > > > > > > >
> > > > > > > > I guess the answer will depend on how easy it should be for
> > > Polaris
> > > > > to
> > > > > > > > support multi-tenancy.
> > > > > > > >
> > > > > > > > A separate database per realm would allow administrators to
> > limit
> > > > the
> > > > > > > > amount of resources that a realm can consume (e.g. the
> maximum
> > > > number
> > > > > > of
> > > > > > > > database connections).  Indeed, it would be one of the
> > strongest
> > > > > > > isolation
> > > > > > > > mode.  However, the code would need to support a complete
> > > database
> > > > > > > > configuration per realm (think username and password and
> > possibly
> > > > IP
> > > > > > > > address) if the goal is to match Postgres capabilities.  In
> > terms
> > > > of
> > > > > > > > backup/restore, it is the most flexible option.
> > > > > > > >
> > > > > > > > A "one schema per realm" approach would be a simpler
> approach,
> > > > > > regarding
> > > > > > > > datasource configuration.  However, there would be less
> > isolation
> > > > > > between
> > > > > > > > realms, and a resource utilization spike on one realm could
> > > impact
> > > > > > > > performance of another realm.  It is as flexible as option #1
> > > > > regarding
> > > > > > > > backup and restore.
> > > > > > > >
> > > > > > > > A "realm as part of the primary key" approach is the most
> > > efficient
> > > > > > way,
> > > > > > > in
> > > > > > > > that the cost of adding tenants is close to zero.  Like in
> > option
> > > > #2,
> > > > > > > there
> > > > > > > > is no real resource isolation between tenants and a
> > > noisy-neighbor
> > > > > > > > situation is a possible issue.  The biggest difference is
> > > regarding
> > > > > > > backup
> > > > > > > > and restore.  Consider the case where data is accidentally
> > > > > > > > wiped/corrupted/modified/... in a given tenant and
> > administrators
> > > > > want
> > > > > > to
> > > > > > > > restore it to a previous state.  With this approach, it is a
> > much
> > > > > more
> > > > > > > > complex as Postgres does not (AFAIK) allow the possibility to
> > > > restore
> > > > > > > > tables partially.
> > > > > > > >
> > > > > > > > Just my 2 cents
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Pierre
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Apr 15, 2025 at 12:42 AM Prashant Singh
> > > > > > > > <prashant.si...@snowflake.com.invalid> wrote:
> > > > > > > >
> > > > > > > > > Dear Polaris Community,
> > > > > > > > >
> > > > > > > > > This email initiates a discussion regarding the modeling of
> > > > Realms
> > > > > > > within
> > > > > > > > > the Polaris project, following its recent mention in my
> JDBC
> > > > > > > > implementation
> > > > > > > > > pull request:
> > > > > > > > >
> > https://github.com/apache/polaris/pull/1287/files#r2040383971.
> > > > > > > > >
> > > > > > > > > My current understanding, based on available information,
> is
> > > that
> > > > > > > Realms
> > > > > > > > > were primarily intended for isolation. Consequently, the
> > > > > EclipseLink
> > > > > > > > > implementation treats each Realm as a separate database.
> > > > > > > > >
> > > > > > > > > As we are re-implementing this functionality, it was
> > suggested
> > > > that
> > > > > > we
> > > > > > > > > gather community feedback on the optimal approach to
> modeling
> > > > > Realms.
> > > > > > > > >
> > > > > > > > > Based on my current understanding, here are potential
> > modeling
> > > > > > options:
> > > > > > > > >
> > > > > > > > > *1. Separate Databases per Realm:*
> > > > > > > > >
> > > > > > > > >    - Each Realm would correspond to a distinct database.
> > > > > > > > >    - This could be implemented using Quarkus custom data
> > > sources,
> > > > > > with
> > > > > > > > one
> > > > > > > > >    data source per Realm.
> > > > > > > > >
> > > > > > > > > *2. Separate Schemas per Realm:*
> > > > > > > > >
> > > > > > > > >    - Each Realm would correspond to a distinct database
> > schema
> > > > > > within a
> > > > > > > > >    single database.
> > > > > > > > >    - Most database systems support two-part identifiers (
> > > > > > > > >    <schema_name>.<table_name>), allowing for data
> isolation.
> > > > > > > > >
> > > > > > > > > *3. Realm as a Primary Key:*
> > > > > > > > >
> > > > > > > > >    - A realm identifier would be added as a primary key (or
> > > part
> > > > > of a
> > > > > > > > >    composite primary key) to each Polaris table.
> > > > > > > > >    - Data isolation would be enforced through filtering
> based
> > > on
> > > > > this
> > > > > > > key
> > > > > > > > >    during data access.
> > > > > > > > >
> > > > > > > > > The optimal approach will likely depend on ease of use and
> > > > > > > > maintainability
> > > > > > > > > for database administrators.
> > > > > > > > >
> > > > > > > > > Please share your thoughts and preferences regarding these
> > > > options.
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > >
> > > > > > > > > Prashant Singh
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to