Going with options 1 and 3 initially sounds good to me. This should simplify current JDBC PRs too.
We can certainly add capabilities later, because having realm ID in the PR does not preclude other deployment choices. Cheers, Dmitri. On Tue, Apr 15, 2025 at 1:49 PM Michael Collado <collado.m...@gmail.com> wrote: > My $.02 is that Option 1 is entirely possible using a DataSource that > dynamically creates Connections as needed. Option 1 is nice because, as > Pierre said, it gives admins the ability to dynamically allocate resources > to different clients as needed. > > Personally, I'm less inclined to option 3 just because it means potentially > larger blast radius if database credentials are ever leaked. But if most > end users are expecting to only manage a single realm, it's probably the > easiest and solves the most common use case. > > I like the option of combining 1 and 3 - by default, a single tenant > deployment writes to a single end database, but admins have the ability to > configure dynamic connections to different database endpoints if multiple > realms are supported. > > Mike > > On Tue, Apr 15, 2025 at 9:32 AM Alex Dutra <alex.du...@dremio.com.invalid> > wrote: > > > Hi all, > > > > I'm in agreement with Pierre, JB and Dmitri's points. I’d like to add > some > > context from the Quarkus configuration angle: > > > > Option 1, which involves distinct datasources, presents a challenge. > > Quarkus requires all datasources to be present and fully configured at > > build time. This requirement could be quite cumbersome for end users, > > making this option less user-friendly in practice. > > > > Regarding Option 2, while it's theoretically possible to manage multiple > > schemas with a single datasource, implementing this can be complex. To > > effectively work with different schemas in PostgreSQL, you would need to > > either qualify all table identifiers or adjust the `search_path` URL > > parameter. Additionally, other JDBC backends like MySQL don't support > > multiple schemas per database, which would make Option 2 less portable > > across different JDBC databases. > > > > That's why I think Option 3 is the most portable one, and the easiest for > > users or administrators to configure. As Pierre noted, it is subject to > > noisy neighbor interferences – but to some extent, I think interferences > > could also happen with separate schemas like in option 2. > > > > Just my 2 cents. > > > > Thanks, > > > > Alex > > > > > > On Tue, Apr 15, 2025 at 4:00 PM Dmitri Bourlatchkov <di...@apache.org> > > wrote: > > > > > Thanks for your perspective, Pierre! You make good points and I agree > > with > > > them. > > > > > > From my POV, I'd add that we probably need to take deployment concerns > > into > > > account too. > > > > > > If the deployment uses the database per realm approach (option 1) then > > > someone has to provide database connection parameters (including > > secrets). > > > If that is the deployment administrator, then the admin necessarily has > > to > > > be aware of all realms and effectively has control of the data in all > > > realms. Isolation is achieved only for end users. > > > > > > That said, even with option 3 the deployment owner has control over all > > > realms and end users are isolated as far as their access to APIs is > > > concerned. End users cannot discover each other's data (barring coding > > > mistakes in Polaris). The same goes for option 2 as it's the middle > > ground. > > > > > > I do not see any material difference between options 1, 2 and 3 from > the > > > end user's perspective. > > > > > > If, however, the database connection parameters are not controlled by > the > > > administrator, but by the end user who wants to define a realm, then > > > Polaris needs to expose managing database connections and secrets. This > > may > > > be a valuable feature, but I believe it is far beyond current Polaris > > > backend capabilities. I do not think going this way is justified at > this > > > time. > > > > > > I'd like to propose a hybrid approach where Polaris provides > capabilities > > > (and config) for the administrators to choose between options 1, 2, 3 > > > according to their specific deployment concerns. > > > > > > This means that the primary key has to include the realm ID, because if > > the > > > Polaris code does not provide it then the admin will not be able to > > choose > > > option 3 at runtime. > > > > > > WDYT? > > > > > > Thanks, > > > Dmitri. > > > > > > On Tue, Apr 15, 2025 at 8:35 AM Pierre Laporte <pie...@pingtimeout.fr> > > > wrote: > > > > > > > Hi Prashant > > > > > > > > I guess the answer will depend on how easy it should be for Polaris > to > > > > support multi-tenancy. > > > > > > > > A separate database per realm would allow administrators to limit the > > > > amount of resources that a realm can consume (e.g. the maximum number > > of > > > > database connections). Indeed, it would be one of the strongest > > > isolation > > > > mode. However, the code would need to support a complete database > > > > configuration per realm (think username and password and possibly IP > > > > address) if the goal is to match Postgres capabilities. In terms of > > > > backup/restore, it is the most flexible option. > > > > > > > > A "one schema per realm" approach would be a simpler approach, > > regarding > > > > datasource configuration. However, there would be less isolation > > between > > > > realms, and a resource utilization spike on one realm could impact > > > > performance of another realm. It is as flexible as option #1 > regarding > > > > backup and restore. > > > > > > > > A "realm as part of the primary key" approach is the most efficient > > way, > > > in > > > > that the cost of adding tenants is close to zero. Like in option #2, > > > there > > > > is no real resource isolation between tenants and a noisy-neighbor > > > > situation is a possible issue. The biggest difference is regarding > > > backup > > > > and restore. Consider the case where data is accidentally > > > > wiped/corrupted/modified/... in a given tenant and administrators > want > > to > > > > restore it to a previous state. With this approach, it is a much > more > > > > complex as Postgres does not (AFAIK) allow the possibility to restore > > > > tables partially. > > > > > > > > Just my 2 cents > > > > > > > > -- > > > > > > > > Pierre > > > > > > > > > > > > On Tue, Apr 15, 2025 at 12:42 AM Prashant Singh > > > > <prashant.si...@snowflake.com.invalid> wrote: > > > > > > > > > Dear Polaris Community, > > > > > > > > > > This email initiates a discussion regarding the modeling of Realms > > > within > > > > > the Polaris project, following its recent mention in my JDBC > > > > implementation > > > > > pull request: > > > > > https://github.com/apache/polaris/pull/1287/files#r2040383971. > > > > > > > > > > My current understanding, based on available information, is that > > > Realms > > > > > were primarily intended for isolation. Consequently, the > EclipseLink > > > > > implementation treats each Realm as a separate database. > > > > > > > > > > As we are re-implementing this functionality, it was suggested that > > we > > > > > gather community feedback on the optimal approach to modeling > Realms. > > > > > > > > > > Based on my current understanding, here are potential modeling > > options: > > > > > > > > > > *1. Separate Databases per Realm:* > > > > > > > > > > - Each Realm would correspond to a distinct database. > > > > > - This could be implemented using Quarkus custom data sources, > > with > > > > one > > > > > data source per Realm. > > > > > > > > > > *2. Separate Schemas per Realm:* > > > > > > > > > > - Each Realm would correspond to a distinct database schema > > within a > > > > > single database. > > > > > - Most database systems support two-part identifiers ( > > > > > <schema_name>.<table_name>), allowing for data isolation. > > > > > > > > > > *3. Realm as a Primary Key:* > > > > > > > > > > - A realm identifier would be added as a primary key (or part > of a > > > > > composite primary key) to each Polaris table. > > > > > - Data isolation would be enforced through filtering based on > this > > > key > > > > > during data access. > > > > > > > > > > The optimal approach will likely depend on ease of use and > > > > maintainability > > > > > for database administrators. > > > > > > > > > > Please share your thoughts and preferences regarding these options. > > > > > > > > > > Best regards, > > > > > > > > > > Prashant Singh > > > > > > > > > > > > > > >