To clarify, Polaris *does* support multi-tenancy. What’s currently limited with Option 1 is *account-level* multi-tenancy specifically in the context of EclipseLink.
1. *Multi-account support* is most relevant when a vendor wants to commercialize Polaris and offer it as a service to customers. It won't be a popular choice across all Apache Polaris adoption. I have never heard anyone asking for it in the Polaris community. 2. *Promoting realmId for multi-account usage* may make sense for NoSQL backends, which are typically distributed and scale well. I believe this is what Snowflake and Dremio do with managed Polaris. However, this thread is focused on the *Postgres JDBC implementation*, which often runs on a single-node setup. In that case, Option 3 could introduce performance bottlenecks. 3. *Option 1 provides stronger isolation*. A bug in the persistence layer or a misconfiguration at the RDBMS level could more easily cause cross-account data leakage without it. Remember that a realm may represent environments like dev, QA, or prod, isolation matters. (ref <https://github.com/polaris-catalog/polaris/blob/main/polaris-core/src/main/java/org/apache/polaris/core/context/RealmContext.java#L23-L23>). Plus, as Pierre said, a noisy neighbor could impact across-account performance, and keep in mind, it’s a single node database. 4. A solution that mixes Option 1 and Option 3 would be a *breaking change*. It would require not only schema updates but also modifications to admin tooling like the bootstrap logic. If there's strong interest, I’d suggest pursuing that as a *separate proposal*. Given all this, I don’t think the realmId schema change should block the JDBC implementation work. Yufei On Tue, Apr 22, 2025 at 5:57 AM Alex Dutra <alex.du...@dremio.com.invalid> wrote: > Hi all, > > I also would like to reiterate that Quarkus has no particular support for > multi-tenancy with options 1 or 2: unless there are only a handful of > datasources to use, and they can all be fixed at build time (which I think > is not the case here), we'd need to manage the datasources and their > connection pools ourselves. I hope we are all aware of that and OK with it. > > Thanks, > > Alex > > On Mon, Apr 21, 2025 at 11:06 PM Dmitri Bourlatchkov <di...@apache.org> > wrote: > > > My point is that if we do not include realm ID in the Primary Key (option > > 1), then we're effectively forcing all users to deploy Polaris with a > > DataSource per Realm approach. I do not see how we can decouple this > > concern from the JDBC schema. Any subsequent schema changes will > complicate > > upgrades. > > > > My personal opinion is that we do not have to force users this way (and > > offer deployment flexibility as discussed previously). > > > > I do not really see any operational ambiguity in option 3. Administrators > > have to define a DataSource anyway. Diligent Administrators have to > > understand the JDBC schema anyway. If the config defaults are such that > > reusing a DataSource for many realms is _not_ allowed, an Administrator > > cannot mix data by mistake. > > > > Also, I believe the extra code complexity is negligible to the complexity > > of ensuring correct operation during concurrent updates. > > > > While it is not my intention to block going with option 1 only, I believe > > we have to make project decisions with clarity, therefore I raise this > > point (again) and ask people to acknowledge that this is indeed the > > direction we want to go. > > > > Thanks, > > Dmitri. > > > > On Mon, Apr 21, 2025 at 1:22 PM Prashant Singh > > <prashant.si...@snowflake.com.invalid> wrote: > > > > > Hey All, > > > > > > Based on our recent discussion and the PR feedback, it seems like we > need > > > more in-depth conversations to align on the best path forward. > > > > > > Considering this, I'd like to propose we decouple this particular > feature > > > from the current JDBC implementation. > > > > > > My reasoning for this suggestion is as follows: > > > > > > 1. Following the precedent set by EclipseLink, the initial goal of > the > > > JDBC implementation was to *replace* EclipseLink. This new feature > > feels > > > like an addition to that core effort. > > > 2. We anticipate revisiting schema changes when we discuss a > separate > > > DAO for the Entity layer. This means the schema we're currently > > > considering > > > isn't necessarily final. > > > 3. Many users are eagerly awaiting the JDBC implementation due to > the > > > scalability limitations of the current EclipseLink solution. > > Decoupling > > > this might allow us to deliver the core JDBC benefits sooner. > > > > > > I'd love to hear your thoughts on this proposal. > > > > > > Best, Prashant > > > > > > > > > On Fri, Apr 18, 2025 at 3:57 PM Yufei Gu <flyrain...@gmail.com> wrote: > > > > > > > Thanks for the thoughtful input. > > > > > > > > While it's true that some environments may not require strict > > separation > > > > between realms, the risk of incorrect usage or subtle cross-realm > > > > interference is significantly higher if we allow shared databases > > without > > > > enforcing strong boundaries. > > > > > > > > Option 1 gives us strong, predictable isolation with minimal > complexity > > > and > > > > fewer edge cases. Yes, if multiple realms are mixed in the same JVM > > even > > > > with option 1, isolation may still be compromised, but at least the > > > design > > > > makes this explicit and easier to reason about. Running one realm per > > > > Polaris instance is a reasonable solution for environments that value > > > > isolation, and option 1 just works, while option 3 adds unnecessary > > > > complexity. > > > > > > > > I believe adding support for both option 1 and option 3 introduces > not > > > just > > > > code complexity, but also operational ambiguity and a burden on users > > to > > > > fully understand the trade-offs. Instead of delegating this to > admins, > > we > > > > should first aim for clarity and safety in the design. > > > > > > > > We can always revisit this in the future if a strong real-world use > > case > > > > arises. For now, I’d prefer we keep the design simple and > unambiguous. > > > > > > > > Yufei > > > > > > > > > > > > On Fri, Apr 18, 2025 at 3:17 PM Dmitri Bourlatchkov < > di...@apache.org> > > > > wrote: > > > > > > > > > I believe users of Apache Polaris may want to share the database > > across > > > > > many realms in environments that do not need secure separation of > > > realms. > > > > > This is hypothetical, at this point, of course. However, If option > 3 > > is > > > > not > > > > > supported by code that use case will be impossible (or require > > > subsequent > > > > > changes and releases). > > > > > > > > > > Even with option 1 if multiple realms are mixed in memory, the > > > isolation > > > > > guarantees are not much stronger than with option 3. If the main > > > concern > > > > is > > > > > strong isolation, then Polaris Servers should run with only one > realm > > > per > > > > > instance (per JVM). > > > > > > > > > > I propose to delegate this decision to the Polaris admin. > > > > > > > > > > I do not think the code will have to be more complex to support > both > > > > > options 1 and 3 compared to option 1 alone. In fact, as far as I > can > > > > tell, > > > > > supporting option 1 plus multiple realms per JVM is more complex > than > > > > > option 3 alone. > > > > > > > > > > Cheers, > > > > > Dmitri. > > > > > > > > > > > > > > > On Fri, Apr 18, 2025 at 4:38 PM Yufei Gu <flyrain...@gmail.com> > > wrote: > > > > > > > > > > > Hi Folks, > > > > > > > > > > > > As we discussed, option 1 provides the strongest isolation, which > > > > should > > > > > > work particularly well for dynamically created data sources. > > Another > > > > > > significant benefit is that it's less complicated overall. > > > > > > > > > > > > I'm not convinced we need both option 1 and option 3. For > scenarios > > > > > > involving only a single realm, the concept of a realm becomes > > > > > unnecessary. > > > > > > In that case, there's no need for any additional options, > including > > > > > option > > > > > > 3. > > > > > > > > > > > > Yufei > > > > > > > > > > > > > > > > > > On Tue, Apr 15, 2025 at 11:19 AM Dmitri Bourlatchkov < > > > di...@apache.org > > > > > > > > > > > wrote: > > > > > > > > > > > > > Going with options 1 and 3 initially sounds good to me. This > > should > > > > > > > simplify current JDBC PRs too. > > > > > > > > > > > > > > We can certainly add capabilities later, because having realm > ID > > in > > > > the > > > > > > PR > > > > > > > does not preclude other deployment choices. > > > > > > > > > > > > > > Cheers, > > > > > > > Dmitri. > > > > > > > > > > > > > > On Tue, Apr 15, 2025 at 1:49 PM Michael Collado < > > > > > collado.m...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > My $.02 is that Option 1 is entirely possible using a > > DataSource > > > > that > > > > > > > > dynamically creates Connections as needed. Option 1 is nice > > > > because, > > > > > as > > > > > > > > Pierre said, it gives admins the ability to dynamically > > allocate > > > > > > > resources > > > > > > > > to different clients as needed. > > > > > > > > > > > > > > > > Personally, I'm less inclined to option 3 just because it > means > > > > > > > potentially > > > > > > > > larger blast radius if database credentials are ever leaked. > > But > > > if > > > > > > most > > > > > > > > end users are expecting to only manage a single realm, it's > > > > probably > > > > > > the > > > > > > > > easiest and solves the most common use case. > > > > > > > > > > > > > > > > I like the option of combining 1 and 3 - by default, a single > > > > tenant > > > > > > > > deployment writes to a single end database, but admins have > the > > > > > ability > > > > > > > to > > > > > > > > configure dynamic connections to different database endpoints > > if > > > > > > multiple > > > > > > > > realms are supported. > > > > > > > > > > > > > > > > Mike > > > > > > > > > > > > > > > > On Tue, Apr 15, 2025 at 9:32 AM Alex Dutra > > > > > > <alex.du...@dremio.com.invalid > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > I'm in agreement with Pierre, JB and Dmitri's points. I’d > > like > > > to > > > > > add > > > > > > > > some > > > > > > > > > context from the Quarkus configuration angle: > > > > > > > > > > > > > > > > > > Option 1, which involves distinct datasources, presents a > > > > > challenge. > > > > > > > > > Quarkus requires all datasources to be present and fully > > > > configured > > > > > > at > > > > > > > > > build time. This requirement could be quite cumbersome for > > end > > > > > users, > > > > > > > > > making this option less user-friendly in practice. > > > > > > > > > > > > > > > > > > Regarding Option 2, while it's theoretically possible to > > manage > > > > > > > multiple > > > > > > > > > schemas with a single datasource, implementing this can be > > > > complex. > > > > > > To > > > > > > > > > effectively work with different schemas in PostgreSQL, you > > > would > > > > > need > > > > > > > to > > > > > > > > > either qualify all table identifiers or adjust the > > > `search_path` > > > > > URL > > > > > > > > > parameter. Additionally, other JDBC backends like MySQL > don't > > > > > support > > > > > > > > > multiple schemas per database, which would make Option 2 > less > > > > > > portable > > > > > > > > > across different JDBC databases. > > > > > > > > > > > > > > > > > > That's why I think Option 3 is the most portable one, and > the > > > > > easiest > > > > > > > for > > > > > > > > > users or administrators to configure. As Pierre noted, it > is > > > > > subject > > > > > > to > > > > > > > > > noisy neighbor interferences – but to some extent, I think > > > > > > > interferences > > > > > > > > > could also happen with separate schemas like in option 2. > > > > > > > > > > > > > > > > > > Just my 2 cents. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > Alex > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Apr 15, 2025 at 4:00 PM Dmitri Bourlatchkov < > > > > > > di...@apache.org> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Thanks for your perspective, Pierre! You make good points > > > and I > > > > > > agree > > > > > > > > > with > > > > > > > > > > them. > > > > > > > > > > > > > > > > > > > > From my POV, I'd add that we probably need to take > > deployment > > > > > > > concerns > > > > > > > > > into > > > > > > > > > > account too. > > > > > > > > > > > > > > > > > > > > If the deployment uses the database per realm approach > > > (option > > > > 1) > > > > > > > then > > > > > > > > > > someone has to provide database connection parameters > > > > (including > > > > > > > > > secrets). > > > > > > > > > > If that is the deployment administrator, then the admin > > > > > necessarily > > > > > > > has > > > > > > > > > to > > > > > > > > > > be aware of all realms and effectively has control of the > > > data > > > > in > > > > > > all > > > > > > > > > > realms. Isolation is achieved only for end users. > > > > > > > > > > > > > > > > > > > > That said, even with option 3 the deployment owner has > > > control > > > > > over > > > > > > > all > > > > > > > > > > realms and end users are isolated as far as their access > to > > > > APIs > > > > > is > > > > > > > > > > concerned. End users cannot discover each other's data > > > (barring > > > > > > > coding > > > > > > > > > > mistakes in Polaris). The same goes for option 2 as it's > > the > > > > > middle > > > > > > > > > ground. > > > > > > > > > > > > > > > > > > > > I do not see any material difference between options 1, 2 > > > and 3 > > > > > > from > > > > > > > > the > > > > > > > > > > end user's perspective. > > > > > > > > > > > > > > > > > > > > If, however, the database connection parameters are not > > > > > controlled > > > > > > by > > > > > > > > the > > > > > > > > > > administrator, but by the end user who wants to define a > > > realm, > > > > > > then > > > > > > > > > > Polaris needs to expose managing database connections and > > > > > secrets. > > > > > > > This > > > > > > > > > may > > > > > > > > > > be a valuable feature, but I believe it is far beyond > > current > > > > > > Polaris > > > > > > > > > > backend capabilities. I do not think going this way is > > > > justified > > > > > at > > > > > > > > this > > > > > > > > > > time. > > > > > > > > > > > > > > > > > > > > I'd like to propose a hybrid approach where Polaris > > provides > > > > > > > > capabilities > > > > > > > > > > (and config) for the administrators to choose between > > options > > > > 1, > > > > > > 2, 3 > > > > > > > > > > according to their specific deployment concerns. > > > > > > > > > > > > > > > > > > > > This means that the primary key has to include the realm > > ID, > > > > > > because > > > > > > > if > > > > > > > > > the > > > > > > > > > > Polaris code does not provide it then the admin will not > be > > > > able > > > > > to > > > > > > > > > choose > > > > > > > > > > option 3 at runtime. > > > > > > > > > > > > > > > > > > > > WDYT? > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Dmitri. > > > > > > > > > > > > > > > > > > > > On Tue, Apr 15, 2025 at 8:35 AM Pierre Laporte < > > > > > > > pie...@pingtimeout.fr> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi Prashant > > > > > > > > > > > > > > > > > > > > > > I guess the answer will depend on how easy it should be > > for > > > > > > Polaris > > > > > > > > to > > > > > > > > > > > support multi-tenancy. > > > > > > > > > > > > > > > > > > > > > > A separate database per realm would allow > administrators > > to > > > > > limit > > > > > > > the > > > > > > > > > > > amount of resources that a realm can consume (e.g. the > > > > maximum > > > > > > > number > > > > > > > > > of > > > > > > > > > > > database connections). Indeed, it would be one of the > > > > > strongest > > > > > > > > > > isolation > > > > > > > > > > > mode. However, the code would need to support a > complete > > > > > > database > > > > > > > > > > > configuration per realm (think username and password > and > > > > > possibly > > > > > > > IP > > > > > > > > > > > address) if the goal is to match Postgres capabilities. > > In > > > > > terms > > > > > > > of > > > > > > > > > > > backup/restore, it is the most flexible option. > > > > > > > > > > > > > > > > > > > > > > A "one schema per realm" approach would be a simpler > > > > approach, > > > > > > > > > regarding > > > > > > > > > > > datasource configuration. However, there would be less > > > > > isolation > > > > > > > > > between > > > > > > > > > > > realms, and a resource utilization spike on one realm > > could > > > > > > impact > > > > > > > > > > > performance of another realm. It is as flexible as > > option > > > #1 > > > > > > > > regarding > > > > > > > > > > > backup and restore. > > > > > > > > > > > > > > > > > > > > > > A "realm as part of the primary key" approach is the > most > > > > > > efficient > > > > > > > > > way, > > > > > > > > > > in > > > > > > > > > > > that the cost of adding tenants is close to zero. Like > > in > > > > > option > > > > > > > #2, > > > > > > > > > > there > > > > > > > > > > > is no real resource isolation between tenants and a > > > > > > noisy-neighbor > > > > > > > > > > > situation is a possible issue. The biggest difference > is > > > > > > regarding > > > > > > > > > > backup > > > > > > > > > > > and restore. Consider the case where data is > > accidentally > > > > > > > > > > > wiped/corrupted/modified/... in a given tenant and > > > > > administrators > > > > > > > > want > > > > > > > > > to > > > > > > > > > > > restore it to a previous state. With this approach, it > > is > > > a > > > > > much > > > > > > > > more > > > > > > > > > > > complex as Postgres does not (AFAIK) allow the > > possibility > > > to > > > > > > > restore > > > > > > > > > > > tables partially. > > > > > > > > > > > > > > > > > > > > > > Just my 2 cents > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > Pierre > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Apr 15, 2025 at 12:42 AM Prashant Singh > > > > > > > > > > > <prashant.si...@snowflake.com.invalid> wrote: > > > > > > > > > > > > > > > > > > > > > > > Dear Polaris Community, > > > > > > > > > > > > > > > > > > > > > > > > This email initiates a discussion regarding the > > modeling > > > of > > > > > > > Realms > > > > > > > > > > within > > > > > > > > > > > > the Polaris project, following its recent mention in > my > > > > JDBC > > > > > > > > > > > implementation > > > > > > > > > > > > pull request: > > > > > > > > > > > > > > > > > https://github.com/apache/polaris/pull/1287/files#r2040383971. > > > > > > > > > > > > > > > > > > > > > > > > My current understanding, based on available > > information, > > > > is > > > > > > that > > > > > > > > > > Realms > > > > > > > > > > > > were primarily intended for isolation. Consequently, > > the > > > > > > > > EclipseLink > > > > > > > > > > > > implementation treats each Realm as a separate > > database. > > > > > > > > > > > > > > > > > > > > > > > > As we are re-implementing this functionality, it was > > > > > suggested > > > > > > > that > > > > > > > > > we > > > > > > > > > > > > gather community feedback on the optimal approach to > > > > modeling > > > > > > > > Realms. > > > > > > > > > > > > > > > > > > > > > > > > Based on my current understanding, here are potential > > > > > modeling > > > > > > > > > options: > > > > > > > > > > > > > > > > > > > > > > > > *1. Separate Databases per Realm:* > > > > > > > > > > > > > > > > > > > > > > > > - Each Realm would correspond to a distinct > > database. > > > > > > > > > > > > - This could be implemented using Quarkus custom > > data > > > > > > sources, > > > > > > > > > with > > > > > > > > > > > one > > > > > > > > > > > > data source per Realm. > > > > > > > > > > > > > > > > > > > > > > > > *2. Separate Schemas per Realm:* > > > > > > > > > > > > > > > > > > > > > > > > - Each Realm would correspond to a distinct > database > > > > > schema > > > > > > > > > within a > > > > > > > > > > > > single database. > > > > > > > > > > > > - Most database systems support two-part > > identifiers ( > > > > > > > > > > > > <schema_name>.<table_name>), allowing for data > > > > isolation. > > > > > > > > > > > > > > > > > > > > > > > > *3. Realm as a Primary Key:* > > > > > > > > > > > > > > > > > > > > > > > > - A realm identifier would be added as a primary > key > > > (or > > > > > > part > > > > > > > > of a > > > > > > > > > > > > composite primary key) to each Polaris table. > > > > > > > > > > > > - Data isolation would be enforced through > filtering > > > > based > > > > > > on > > > > > > > > this > > > > > > > > > > key > > > > > > > > > > > > during data access. > > > > > > > > > > > > > > > > > > > > > > > > The optimal approach will likely depend on ease of > use > > > and > > > > > > > > > > > maintainability > > > > > > > > > > > > for database administrators. > > > > > > > > > > > > > > > > > > > > > > > > Please share your thoughts and preferences regarding > > > these > > > > > > > options. > > > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > > > > > > > Prashant Singh > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >