Thanks Dmitri! I agree the SPI itself is not the main concern here. IIUC, the question is less whether some realm-awareness exists somewhere in the implementation, and more what Polaris wants to make part of the initial built-in contract. My concern is that once the default story starts looking realm-driven, we may be implicitly hardening a broader model than we actually want to support yet.
I’m still biased toward keeping the v1 built-in path narrower: - config-driven, - a small fixed set of purpose/store buckets, - and routing behavior that stays stable at the transaction / unit-of-work boundary. That still leaves room for a pluggable resolver SPI and downstream customization, but keeps OSS Polaris itself from committing too early to a more dynamic routing model before the config / migration / consistency story is fully settled. So I think the main thing I’d want to make more explicit is the boundary between: 1. extensibility the SPI permits, and 2. the built-in routing model Polaris is actually recommending and prepared to support in v1. If we can make that boundary crisp, I’m much less worried about the abstraction itself. -ej On Thu, Mar 12, 2026 at 11:04 AM Dmitri Bourlatchkov <[email protected]> wrote: > Hi EJ, > > In the current state of PR [3960], the DataSourceResolver interface enables > customizing per-realm DataSource resolution. > > However, Apache Polaris code is not complicated with any new per-realm > logic related to DataSources. The OSS side keeps working as before. > > I'm not sure whether Sunham intended to delegate this logic to custom > implementations or not, but as the PR stands the code looks pretty > reasonable to me. > > In the current state of the codebase, I do not think we can completely > avoid dealing with realms as they relate to DataSources. The decision about > how to cross-link them has to be made somewhere. IMHO, the > proposed DefaultDataSourceResolver looks like a good place to make that > decision. It is certainly subject to further evolution if we need to make > adjustments later. > > Before [3960] all realms implicitly received the default DataSource. This > is now explicit in the code. > > [3960] https://github.com/apache/polaris/pull/3960 > > Cheers, > Dmitri. > > On Thu, Mar 12, 2026 at 2:29 AM EJ Wang <[email protected]> > wrote: > > > Hi Subham, Dmitri, Yufei, JB, > > > > I’m generally aligned with the direction here, and I left a few more > > detailed comments on the PR. > > At a high level, my main concern is that the current proposal may still > be > > a bit ahead of the current story/scope. I’d lean toward keeping the first > > step narrower, preferring purpose-based routing over per-realm routing > for > > v1, and making the supported model/config+migration story more explicit > > before the broader contract hardens. > > > > -ej > > > > On Wed, Mar 11, 2026 at 12:37 PM Jean-Baptiste Onofré <[email protected]> > > wrote: > > > > > Hi Subham, > > > > > > Thanks for this contribution. It's an interesting feature. > > > > > > As mentioned in the GitHub issues, I am fine with moving forward with a > > PR > > > as long as it remains a Draft PR to help drive the discussion. I > suggest > > > linking a GitHub Discussion or a Design Doc within the PR to help build > > > consensus. > > > > > > That said, I have a few initial comments: > > > > > > 1. I like the SPI approach used in the PR. This should become a > standard > > in > > > Polaris to facilitate custom implementations. > > > 2. I agree that having a data source "per purpose" is a good idea. The > > main > > > question is how we should handle the split: > > > - Very granular (per entity) > > > - By table "meaning" > > > - By realm (this may not be granular enough) > > > > > > From a user standpoint, I believe we should keep it simple. It would > be a > > > first great step forward. For example, in the configuration (e.g., > > > application.properties), it could look like: > > > - polaris.datasource.entities= > > > - polaris.datasource.events= > > > - polaris.datasource.grants= > > > > > > Regards, > > > JB > > > > > > On Tue, Mar 10, 2026 at 11:19 PM Yufei Gu <[email protected]> > wrote: > > > > > > > Hi Subham, > > > > > > > > Thanks for working on this. Given the complexity and long term > > > implications > > > > discussed in https://github.com/apache/polaris/issues/3890, I think > a > > > > short > > > > design doc could still be helpful to capture the intended > architecture > > > and > > > > future evolution. Here are a few questions listed in the issue. I > > believe > > > > these should be answered before jumping to an implementation. > > > > > > > > > > > > 1. Should we split each potential noisy table into its own > dedicated > > > > data source. For example, one data source for events, one for > > metrics, > > > > and > > > > one for idempotency. > > > > 2. Should we allow flexible grouping. For example, events and > > > > idempotency tables sharing one data source, while metrics uses > > > another. > > > > 3. Should we consider different DS per realm instead of > table-level > > > > spliting? > > > > 4. How should schema version information be managed. If tables > live > > in > > > > different data sources, how do we track and coordinate schema > > > evolution. > > > > 5. Should different data sources be allowed to point to different > > > > schemas or databases. This likely aligns with the isolation goal, > > but > > > it > > > > implies that cross table joins become difficult or impossible at > the > > > > database level, leaving only in memory joins as an option. > > > > 6. Should different data sources be allowed to point to the same > > > schema. > > > > If not, we need validation logic to detect and prevent > > > misconfiguration. > > > > > > > > > > > > Yufei > > > > > > > > > > > > On Tue, Mar 10, 2026 at 7:33 AM Dmitri Bourlatchkov < > [email protected]> > > > > wrote: > > > > > > > > > Hi Subham, > > > > > > > > > > Thanks again for your contribution! > > > > > > > > > > I believe PR 3960 moves in the right direction by establishing an > SPI > > > to > > > > > delegate DataSource resolution logic to the runtime environment. > > > > > > > > > > It immediately allows custom implementations in downstream projects > > (if > > > > > people wish to do that) and opens a way for supporting multiple > > > > DataSources > > > > > in Apache Polaris (in follow-up PRs), > > > > > > > > > > I think the PR is pretty clear in itself and does not require any > > extra > > > > > design docs. Let's review it in GH and merge when we have > consensus. > > > > > > > > > > Cheers, > > > > > Dmitri. > > > > > > > > > > On Tue, Mar 10, 2026 at 8:27 AM Subham Sangwan < > > > > > [email protected]> > > > > > wrote: > > > > > > > > > > > Hi Polaris Dev Team I have opened PR #3960 [1] to introduce the > > > > > > foundational groundwork for multi-datasource support in JDBC > > > > persistence, > > > > > > addressing Issue #3890 [2].The goal is to enable physical > isolation > > > of > > > > > > different persistence workloads (METASTORE, METRICS, EVENTS) into > > > > > dedicated > > > > > > connection pools or databases. This will allow Polaris to better > > > handle > > > > > > high-traffic environments by preventing "noisy neighbor" effects > on > > > the > > > > > > core entity tables. > > > > > > > > > > > > Key Highlights: > > > > > > > > > > > > - DataSourceResolver: A new pluggable interface for routing > JDBC > > > > > > connections based on RealmContext and StoreType. > > > > > > - Modular Design: Decoupled the resolution implementation into > > the > > > > > > runtime-common module. > > > > > > - Consistency: Utilizes a type-safe StoreType enum and aligns > > with > > > > > > existing RealmContext patterns. > > > > > > > > > > > > The PR has been refined with feedback from @dimas-b and is now > > ready > > > > for > > > > > > community review. I'd appreciate any feedback on the overall > > > approach. > > > > > > > > > > > > Best regards, > > > > > > > > > > > > Subham Sangwan > > > > > > GitHub: Subham-KRLX > > > > > > > > > > > > [1] https://github.com/apache/polaris/pull/3960 > > > > > > [2] https://github.com/apache/polaris/issues/3890 > > > > > > > > > > > > > > > > > > > > >
