Re: Discussion: Re-evaluating Realm Modeling in Polaris

Dmitri Bourlatchkov Fri, 18 Apr 2025 15:17:28 -0700

I believe users of Apache Polaris may want to share the database across
many realms in environments that do not need secure separation of realms.
This is hypothetical, at this point, of course. However, If option 3 is not
supported by code that use case will be impossible (or require subsequent
changes and releases).


Even with option 1 if multiple realms are mixed in memory, the isolation
guarantees are not much stronger than with option 3. If the main concern is
strong isolation, then Polaris Servers should run with only one realm per
instance (per JVM).

I propose to delegate this decision to the Polaris admin.

I do not think the code will have to be more complex to support both
options 1 and 3 compared to option 1 alone. In fact, as far as I can tell,
supporting option 1 plus multiple realms per JVM is more complex than
option 3 alone.

Cheers,
Dmitri.


On Fri, Apr 18, 2025 at 4:38 PM Yufei Gu <[email protected]> wrote:

> Hi Folks,
>
> As we discussed, option 1 provides the strongest isolation, which should
> work particularly well for dynamically created data sources. Another
> significant benefit is that it's less complicated overall.
>
> I'm not convinced we need both option 1 and option 3. For scenarios
> involving only a single realm, the concept of a realm becomes unnecessary.
> In that case, there's no need for any additional options, including option
> 3.
>
> Yufei
>
>
> On Tue, Apr 15, 2025 at 11:19 AM Dmitri Bourlatchkov <[email protected]>
> wrote:
>
> > Going with options 1 and 3 initially sounds good to me. This should
> > simplify current JDBC PRs too.
> >
> > We can certainly add capabilities later, because having realm ID in the
> PR
> > does not preclude other deployment choices.
> >
> > Cheers,
> > Dmitri.
> >
> > On Tue, Apr 15, 2025 at 1:49 PM Michael Collado <[email protected]>
> > wrote:
> >
> > > My $.02 is that Option 1 is entirely possible using a DataSource that
> > > dynamically creates Connections as needed. Option 1 is nice because, as
> > > Pierre said, it gives admins the ability to dynamically allocate
> > resources
> > > to different clients as needed.
> > >
> > > Personally, I'm less inclined to option 3 just because it means
> > potentially
> > > larger blast radius if database credentials are ever leaked. But if
> most
> > > end users are expecting to only manage a single realm, it's probably
> the
> > > easiest and solves the most common use case.
> > >
> > > I like the option of combining 1 and 3 - by default, a single tenant
> > > deployment writes to a single end database, but admins have the ability
> > to
> > > configure dynamic connections to different database endpoints if
> multiple
> > > realms are supported.
> > >
> > > Mike
> > >
> > > On Tue, Apr 15, 2025 at 9:32 AM Alex Dutra
> <[email protected]
> > >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'm in agreement with Pierre, JB and Dmitri's points. I’d like to add
> > > some
> > > > context from the Quarkus configuration angle:
> > > >
> > > > Option 1, which involves distinct datasources, presents a challenge.
> > > > Quarkus requires all datasources to be present and fully configured
> at
> > > > build time. This requirement could be quite cumbersome for end users,
> > > > making this option less user-friendly in practice.
> > > >
> > > > Regarding Option 2, while it's theoretically possible to manage
> > multiple
> > > > schemas with a single datasource, implementing this can be complex.
> To
> > > > effectively work with different schemas in PostgreSQL, you would need
> > to
> > > > either qualify all table identifiers or adjust the `search_path` URL
> > > > parameter. Additionally, other JDBC backends like MySQL don't support
> > > > multiple schemas per database, which would make Option 2 less
> portable
> > > > across different JDBC databases.
> > > >
> > > > That's why I think Option 3 is the most portable one, and the easiest
> > for
> > > > users or administrators to configure. As Pierre noted, it is subject
> to
> > > > noisy neighbor interferences – but to some extent, I think
> > interferences
> > > > could also happen with separate schemas like in option 2.
> > > >
> > > > Just my 2 cents.
> > > >
> > > > Thanks,
> > > >
> > > > Alex
> > > >
> > > >
> > > > On Tue, Apr 15, 2025 at 4:00 PM Dmitri Bourlatchkov <
> [email protected]>
> > > > wrote:
> > > >
> > > > > Thanks for your perspective, Pierre! You make good points and I
> agree
> > > > with
> > > > > them.
> > > > >
> > > > > From my POV, I'd add that we probably need to take deployment
> > concerns
> > > > into
> > > > > account too.
> > > > >
> > > > > If the deployment uses the database per realm approach (option 1)
> > then
> > > > > someone has to provide database connection parameters (including
> > > > secrets).
> > > > > If that is the deployment administrator, then the admin necessarily
> > has
> > > > to
> > > > > be aware of all realms and effectively has control of the data in
> all
> > > > > realms. Isolation is achieved only for end users.
> > > > >
> > > > > That said, even with option 3 the deployment owner has control over
> > all
> > > > > realms and end users are isolated as far as their access to APIs is
> > > > > concerned. End users cannot discover each other's data (barring
> > coding
> > > > > mistakes in Polaris). The same goes for option 2 as it's the middle
> > > > ground.
> > > > >
> > > > > I do not see any material difference between options 1, 2 and 3
> from
> > > the
> > > > > end user's perspective.
> > > > >
> > > > > If, however, the database connection parameters are not controlled
> by
> > > the
> > > > > administrator, but by the end user who wants to define a realm,
> then
> > > > > Polaris needs to expose managing database connections and secrets.
> > This
> > > > may
> > > > > be a valuable feature, but I believe it is far beyond current
> Polaris
> > > > > backend capabilities. I do not think going this way is justified at
> > > this
> > > > > time.
> > > > >
> > > > > I'd like to propose a hybrid approach where Polaris provides
> > > capabilities
> > > > > (and config) for the administrators to choose between options 1,
> 2, 3
> > > > > according to their specific deployment concerns.
> > > > >
> > > > > This means that the primary key has to include the realm ID,
> because
> > if
> > > > the
> > > > > Polaris code does not provide it then the admin will not be able to
> > > > choose
> > > > > option 3 at runtime.
> > > > >
> > > > > WDYT?
> > > > >
> > > > > Thanks,
> > > > > Dmitri.
> > > > >
> > > > > On Tue, Apr 15, 2025 at 8:35 AM Pierre Laporte <
> > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Hi Prashant
> > > > > >
> > > > > > I guess the answer will depend on how easy it should be for
> Polaris
> > > to
> > > > > > support multi-tenancy.
> > > > > >
> > > > > > A separate database per realm would allow administrators to limit
> > the
> > > > > > amount of resources that a realm can consume (e.g. the maximum
> > number
> > > > of
> > > > > > database connections).  Indeed, it would be one of the strongest
> > > > > isolation
> > > > > > mode.  However, the code would need to support a complete
> database
> > > > > > configuration per realm (think username and password and possibly
> > IP
> > > > > > address) if the goal is to match Postgres capabilities.  In terms
> > of
> > > > > > backup/restore, it is the most flexible option.
> > > > > >
> > > > > > A "one schema per realm" approach would be a simpler approach,
> > > > regarding
> > > > > > datasource configuration.  However, there would be less isolation
> > > > between
> > > > > > realms, and a resource utilization spike on one realm could
> impact
> > > > > > performance of another realm.  It is as flexible as option #1
> > > regarding
> > > > > > backup and restore.
> > > > > >
> > > > > > A "realm as part of the primary key" approach is the most
> efficient
> > > > way,
> > > > > in
> > > > > > that the cost of adding tenants is close to zero.  Like in option
> > #2,
> > > > > there
> > > > > > is no real resource isolation between tenants and a
> noisy-neighbor
> > > > > > situation is a possible issue.  The biggest difference is
> regarding
> > > > > backup
> > > > > > and restore.  Consider the case where data is accidentally
> > > > > > wiped/corrupted/modified/... in a given tenant and administrators
> > > want
> > > > to
> > > > > > restore it to a previous state.  With this approach, it is a much
> > > more
> > > > > > complex as Postgres does not (AFAIK) allow the possibility to
> > restore
> > > > > > tables partially.
> > > > > >
> > > > > > Just my 2 cents
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Pierre
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 15, 2025 at 12:42 AM Prashant Singh
> > > > > > <[email protected]> wrote:
> > > > > >
> > > > > > > Dear Polaris Community,
> > > > > > >
> > > > > > > This email initiates a discussion regarding the modeling of
> > Realms
> > > > > within
> > > > > > > the Polaris project, following its recent mention in my JDBC
> > > > > > implementation
> > > > > > > pull request:
> > > > > > > https://github.com/apache/polaris/pull/1287/files#r2040383971.
> > > > > > >
> > > > > > > My current understanding, based on available information, is
> that
> > > > > Realms
> > > > > > > were primarily intended for isolation. Consequently, the
> > > EclipseLink
> > > > > > > implementation treats each Realm as a separate database.
> > > > > > >
> > > > > > > As we are re-implementing this functionality, it was suggested
> > that
> > > > we
> > > > > > > gather community feedback on the optimal approach to modeling
> > > Realms.
> > > > > > >
> > > > > > > Based on my current understanding, here are potential modeling
> > > > options:
> > > > > > >
> > > > > > > *1. Separate Databases per Realm:*
> > > > > > >
> > > > > > >    - Each Realm would correspond to a distinct database.
> > > > > > >    - This could be implemented using Quarkus custom data
> sources,
> > > > with
> > > > > > one
> > > > > > >    data source per Realm.
> > > > > > >
> > > > > > > *2. Separate Schemas per Realm:*
> > > > > > >
> > > > > > >    - Each Realm would correspond to a distinct database schema
> > > > within a
> > > > > > >    single database.
> > > > > > >    - Most database systems support two-part identifiers (
> > > > > > >    <schema_name>.<table_name>), allowing for data isolation.
> > > > > > >
> > > > > > > *3. Realm as a Primary Key:*
> > > > > > >
> > > > > > >    - A realm identifier would be added as a primary key (or
> part
> > > of a
> > > > > > >    composite primary key) to each Polaris table.
> > > > > > >    - Data isolation would be enforced through filtering based
> on
> > > this
> > > > > key
> > > > > > >    during data access.
> > > > > > >
> > > > > > > The optimal approach will likely depend on ease of use and
> > > > > > maintainability
> > > > > > > for database administrators.
> > > > > > >
> > > > > > > Please share your thoughts and preferences regarding these
> > options.
> > > > > > >
> > > > > > > Best regards,
> > > > > > >
> > > > > > > Prashant Singh
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Discussion: Re-evaluating Realm Modeling in Polaris

Reply via email to