Hi Folks, As we discussed, option 1 provides the strongest isolation, which should work particularly well for dynamically created data sources. Another significant benefit is that it's less complicated overall.
I'm not convinced we need both option 1 and option 3. For scenarios involving only a single realm, the concept of a realm becomes unnecessary. In that case, there's no need for any additional options, including option 3. Yufei On Tue, Apr 15, 2025 at 11:19 AM Dmitri Bourlatchkov <di...@apache.org> wrote: > Going with options 1 and 3 initially sounds good to me. This should > simplify current JDBC PRs too. > > We can certainly add capabilities later, because having realm ID in the PR > does not preclude other deployment choices. > > Cheers, > Dmitri. > > On Tue, Apr 15, 2025 at 1:49 PM Michael Collado <collado.m...@gmail.com> > wrote: > > > My $.02 is that Option 1 is entirely possible using a DataSource that > > dynamically creates Connections as needed. Option 1 is nice because, as > > Pierre said, it gives admins the ability to dynamically allocate > resources > > to different clients as needed. > > > > Personally, I'm less inclined to option 3 just because it means > potentially > > larger blast radius if database credentials are ever leaked. But if most > > end users are expecting to only manage a single realm, it's probably the > > easiest and solves the most common use case. > > > > I like the option of combining 1 and 3 - by default, a single tenant > > deployment writes to a single end database, but admins have the ability > to > > configure dynamic connections to different database endpoints if multiple > > realms are supported. > > > > Mike > > > > On Tue, Apr 15, 2025 at 9:32 AM Alex Dutra <alex.du...@dremio.com.invalid > > > > wrote: > > > > > Hi all, > > > > > > I'm in agreement with Pierre, JB and Dmitri's points. I’d like to add > > some > > > context from the Quarkus configuration angle: > > > > > > Option 1, which involves distinct datasources, presents a challenge. > > > Quarkus requires all datasources to be present and fully configured at > > > build time. This requirement could be quite cumbersome for end users, > > > making this option less user-friendly in practice. > > > > > > Regarding Option 2, while it's theoretically possible to manage > multiple > > > schemas with a single datasource, implementing this can be complex. To > > > effectively work with different schemas in PostgreSQL, you would need > to > > > either qualify all table identifiers or adjust the `search_path` URL > > > parameter. Additionally, other JDBC backends like MySQL don't support > > > multiple schemas per database, which would make Option 2 less portable > > > across different JDBC databases. > > > > > > That's why I think Option 3 is the most portable one, and the easiest > for > > > users or administrators to configure. As Pierre noted, it is subject to > > > noisy neighbor interferences – but to some extent, I think > interferences > > > could also happen with separate schemas like in option 2. > > > > > > Just my 2 cents. > > > > > > Thanks, > > > > > > Alex > > > > > > > > > On Tue, Apr 15, 2025 at 4:00 PM Dmitri Bourlatchkov <di...@apache.org> > > > wrote: > > > > > > > Thanks for your perspective, Pierre! You make good points and I agree > > > with > > > > them. > > > > > > > > From my POV, I'd add that we probably need to take deployment > concerns > > > into > > > > account too. > > > > > > > > If the deployment uses the database per realm approach (option 1) > then > > > > someone has to provide database connection parameters (including > > > secrets). > > > > If that is the deployment administrator, then the admin necessarily > has > > > to > > > > be aware of all realms and effectively has control of the data in all > > > > realms. Isolation is achieved only for end users. > > > > > > > > That said, even with option 3 the deployment owner has control over > all > > > > realms and end users are isolated as far as their access to APIs is > > > > concerned. End users cannot discover each other's data (barring > coding > > > > mistakes in Polaris). The same goes for option 2 as it's the middle > > > ground. > > > > > > > > I do not see any material difference between options 1, 2 and 3 from > > the > > > > end user's perspective. > > > > > > > > If, however, the database connection parameters are not controlled by > > the > > > > administrator, but by the end user who wants to define a realm, then > > > > Polaris needs to expose managing database connections and secrets. > This > > > may > > > > be a valuable feature, but I believe it is far beyond current Polaris > > > > backend capabilities. I do not think going this way is justified at > > this > > > > time. > > > > > > > > I'd like to propose a hybrid approach where Polaris provides > > capabilities > > > > (and config) for the administrators to choose between options 1, 2, 3 > > > > according to their specific deployment concerns. > > > > > > > > This means that the primary key has to include the realm ID, because > if > > > the > > > > Polaris code does not provide it then the admin will not be able to > > > choose > > > > option 3 at runtime. > > > > > > > > WDYT? > > > > > > > > Thanks, > > > > Dmitri. > > > > > > > > On Tue, Apr 15, 2025 at 8:35 AM Pierre Laporte < > pie...@pingtimeout.fr> > > > > wrote: > > > > > > > > > Hi Prashant > > > > > > > > > > I guess the answer will depend on how easy it should be for Polaris > > to > > > > > support multi-tenancy. > > > > > > > > > > A separate database per realm would allow administrators to limit > the > > > > > amount of resources that a realm can consume (e.g. the maximum > number > > > of > > > > > database connections). Indeed, it would be one of the strongest > > > > isolation > > > > > mode. However, the code would need to support a complete database > > > > > configuration per realm (think username and password and possibly > IP > > > > > address) if the goal is to match Postgres capabilities. In terms > of > > > > > backup/restore, it is the most flexible option. > > > > > > > > > > A "one schema per realm" approach would be a simpler approach, > > > regarding > > > > > datasource configuration. However, there would be less isolation > > > between > > > > > realms, and a resource utilization spike on one realm could impact > > > > > performance of another realm. It is as flexible as option #1 > > regarding > > > > > backup and restore. > > > > > > > > > > A "realm as part of the primary key" approach is the most efficient > > > way, > > > > in > > > > > that the cost of adding tenants is close to zero. Like in option > #2, > > > > there > > > > > is no real resource isolation between tenants and a noisy-neighbor > > > > > situation is a possible issue. The biggest difference is regarding > > > > backup > > > > > and restore. Consider the case where data is accidentally > > > > > wiped/corrupted/modified/... in a given tenant and administrators > > want > > > to > > > > > restore it to a previous state. With this approach, it is a much > > more > > > > > complex as Postgres does not (AFAIK) allow the possibility to > restore > > > > > tables partially. > > > > > > > > > > Just my 2 cents > > > > > > > > > > -- > > > > > > > > > > Pierre > > > > > > > > > > > > > > > On Tue, Apr 15, 2025 at 12:42 AM Prashant Singh > > > > > <prashant.si...@snowflake.com.invalid> wrote: > > > > > > > > > > > Dear Polaris Community, > > > > > > > > > > > > This email initiates a discussion regarding the modeling of > Realms > > > > within > > > > > > the Polaris project, following its recent mention in my JDBC > > > > > implementation > > > > > > pull request: > > > > > > https://github.com/apache/polaris/pull/1287/files#r2040383971. > > > > > > > > > > > > My current understanding, based on available information, is that > > > > Realms > > > > > > were primarily intended for isolation. Consequently, the > > EclipseLink > > > > > > implementation treats each Realm as a separate database. > > > > > > > > > > > > As we are re-implementing this functionality, it was suggested > that > > > we > > > > > > gather community feedback on the optimal approach to modeling > > Realms. > > > > > > > > > > > > Based on my current understanding, here are potential modeling > > > options: > > > > > > > > > > > > *1. Separate Databases per Realm:* > > > > > > > > > > > > - Each Realm would correspond to a distinct database. > > > > > > - This could be implemented using Quarkus custom data sources, > > > with > > > > > one > > > > > > data source per Realm. > > > > > > > > > > > > *2. Separate Schemas per Realm:* > > > > > > > > > > > > - Each Realm would correspond to a distinct database schema > > > within a > > > > > > single database. > > > > > > - Most database systems support two-part identifiers ( > > > > > > <schema_name>.<table_name>), allowing for data isolation. > > > > > > > > > > > > *3. Realm as a Primary Key:* > > > > > > > > > > > > - A realm identifier would be added as a primary key (or part > > of a > > > > > > composite primary key) to each Polaris table. > > > > > > - Data isolation would be enforced through filtering based on > > this > > > > key > > > > > > during data access. > > > > > > > > > > > > The optimal approach will likely depend on ease of use and > > > > > maintainability > > > > > > for database administrators. > > > > > > > > > > > > Please share your thoughts and preferences regarding these > options. > > > > > > > > > > > > Best regards, > > > > > > > > > > > > Prashant Singh > > > > > > > > > > > > > > > > > > > > >