Hi EJ, I'm afraid I do not really understand your concerns.
Do you have specific code changes in mind? It might be easier to proceed by comparing specific PRs. Re: "built-in contract", from my POV is not well-defined now. There was some discussion about formalizing API / SPI boundaries, but it has not progressed yet. Unfortunately, I could not easily find an email thread for this. It certainly makes sense to continue that discussion, but I do not think it should be a blocker for in-progress PRs. Currently, I believe the best expression of downstream user expectations about code is given by the Polaris Evolution doc [1] [1] https://polaris.apache.org/in-dev/unreleased/evolution/ Cheers, Dmitri. On Thu, Mar 12, 2026 at 4:28 PM EJ Wang <[email protected]> wrote: > Thanks Dmitri! I agree the SPI itself is not the main concern here. > > IIUC, the question is less whether some realm-awareness exists somewhere in > the implementation, and more what Polaris wants to make part of the initial > built-in contract. My concern is that once the default story starts looking > realm-driven, we may be implicitly hardening a broader model than we > actually want to support yet. > > I’m still biased toward keeping the v1 built-in path narrower: > - config-driven, > - a small fixed set of purpose/store buckets, > - and routing behavior that stays stable at the transaction / unit-of-work > boundary. > > That still leaves room for a pluggable resolver SPI and downstream > customization, but keeps OSS Polaris itself from committing too early to a > more dynamic routing model before the config / migration / consistency > story is fully settled. So I think the main thing I’d want to make more > explicit is the boundary between: > 1. extensibility the SPI permits, and > 2. the built-in routing model Polaris is actually recommending and prepared > to support in v1. > > If we can make that boundary crisp, I’m much less worried about the > abstraction itself. > > -ej > > On Thu, Mar 12, 2026 at 11:04 AM Dmitri Bourlatchkov <[email protected]> > wrote: > > > Hi EJ, > > > > In the current state of PR [3960], the DataSourceResolver interface > enables > > customizing per-realm DataSource resolution. > > > > However, Apache Polaris code is not complicated with any new per-realm > > logic related to DataSources. The OSS side keeps working as before. > > > > I'm not sure whether Sunham intended to delegate this logic to custom > > implementations or not, but as the PR stands the code looks pretty > > reasonable to me. > > > > In the current state of the codebase, I do not think we can completely > > avoid dealing with realms as they relate to DataSources. The decision > about > > how to cross-link them has to be made somewhere. IMHO, the > > proposed DefaultDataSourceResolver looks like a good place to make that > > decision. It is certainly subject to further evolution if we need to make > > adjustments later. > > > > Before [3960] all realms implicitly received the default DataSource. This > > is now explicit in the code. > > > > [3960] https://github.com/apache/polaris/pull/3960 > > > > Cheers, > > Dmitri. > > > > On Thu, Mar 12, 2026 at 2:29 AM EJ Wang <[email protected]> > > wrote: > > > > > Hi Subham, Dmitri, Yufei, JB, > > > > > > I’m generally aligned with the direction here, and I left a few more > > > detailed comments on the PR. > > > At a high level, my main concern is that the current proposal may still > > be > > > a bit ahead of the current story/scope. I’d lean toward keeping the > first > > > step narrower, preferring purpose-based routing over per-realm routing > > for > > > v1, and making the supported model/config+migration story more explicit > > > before the broader contract hardens. > > > > > > -ej > > > > > > On Wed, Mar 11, 2026 at 12:37 PM Jean-Baptiste Onofré <[email protected] > > > > > wrote: > > > > > > > Hi Subham, > > > > > > > > Thanks for this contribution. It's an interesting feature. > > > > > > > > As mentioned in the GitHub issues, I am fine with moving forward > with a > > > PR > > > > as long as it remains a Draft PR to help drive the discussion. I > > suggest > > > > linking a GitHub Discussion or a Design Doc within the PR to help > build > > > > consensus. > > > > > > > > That said, I have a few initial comments: > > > > > > > > 1. I like the SPI approach used in the PR. This should become a > > standard > > > in > > > > Polaris to facilitate custom implementations. > > > > 2. I agree that having a data source "per purpose" is a good idea. > The > > > main > > > > question is how we should handle the split: > > > > - Very granular (per entity) > > > > - By table "meaning" > > > > - By realm (this may not be granular enough) > > > > > > > > From a user standpoint, I believe we should keep it simple. It would > > be a > > > > first great step forward. For example, in the configuration (e.g., > > > > application.properties), it could look like: > > > > - polaris.datasource.entities= > > > > - polaris.datasource.events= > > > > - polaris.datasource.grants= > > > > > > > > Regards, > > > > JB > > > > > > > > On Tue, Mar 10, 2026 at 11:19 PM Yufei Gu <[email protected]> > > wrote: > > > > > > > > > Hi Subham, > > > > > > > > > > Thanks for working on this. Given the complexity and long term > > > > implications > > > > > discussed in https://github.com/apache/polaris/issues/3890, I > think > > a > > > > > short > > > > > design doc could still be helpful to capture the intended > > architecture > > > > and > > > > > future evolution. Here are a few questions listed in the issue. I > > > believe > > > > > these should be answered before jumping to an implementation. > > > > > > > > > > > > > > > 1. Should we split each potential noisy table into its own > > dedicated > > > > > data source. For example, one data source for events, one for > > > metrics, > > > > > and > > > > > one for idempotency. > > > > > 2. Should we allow flexible grouping. For example, events and > > > > > idempotency tables sharing one data source, while metrics uses > > > > another. > > > > > 3. Should we consider different DS per realm instead of > > table-level > > > > > spliting? > > > > > 4. How should schema version information be managed. If tables > > live > > > in > > > > > different data sources, how do we track and coordinate schema > > > > evolution. > > > > > 5. Should different data sources be allowed to point to > different > > > > > schemas or databases. This likely aligns with the isolation > goal, > > > but > > > > it > > > > > implies that cross table joins become difficult or impossible at > > the > > > > > database level, leaving only in memory joins as an option. > > > > > 6. Should different data sources be allowed to point to the same > > > > schema. > > > > > If not, we need validation logic to detect and prevent > > > > misconfiguration. > > > > > > > > > > > > > > > Yufei > > > > > > > > > > > > > > > On Tue, Mar 10, 2026 at 7:33 AM Dmitri Bourlatchkov < > > [email protected]> > > > > > wrote: > > > > > > > > > > > Hi Subham, > > > > > > > > > > > > Thanks again for your contribution! > > > > > > > > > > > > I believe PR 3960 moves in the right direction by establishing an > > SPI > > > > to > > > > > > delegate DataSource resolution logic to the runtime environment. > > > > > > > > > > > > It immediately allows custom implementations in downstream > projects > > > (if > > > > > > people wish to do that) and opens a way for supporting multiple > > > > > DataSources > > > > > > in Apache Polaris (in follow-up PRs), > > > > > > > > > > > > I think the PR is pretty clear in itself and does not require any > > > extra > > > > > > design docs. Let's review it in GH and merge when we have > > consensus. > > > > > > > > > > > > Cheers, > > > > > > Dmitri. > > > > > > > > > > > > On Tue, Mar 10, 2026 at 8:27 AM Subham Sangwan < > > > > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > Hi Polaris Dev Team I have opened PR #3960 [1] to introduce the > > > > > > > foundational groundwork for multi-datasource support in JDBC > > > > > persistence, > > > > > > > addressing Issue #3890 [2].The goal is to enable physical > > isolation > > > > of > > > > > > > different persistence workloads (METASTORE, METRICS, EVENTS) > into > > > > > > dedicated > > > > > > > connection pools or databases. This will allow Polaris to > better > > > > handle > > > > > > > high-traffic environments by preventing "noisy neighbor" > effects > > on > > > > the > > > > > > > core entity tables. > > > > > > > > > > > > > > Key Highlights: > > > > > > > > > > > > > > - DataSourceResolver: A new pluggable interface for routing > > JDBC > > > > > > > connections based on RealmContext and StoreType. > > > > > > > - Modular Design: Decoupled the resolution implementation > into > > > the > > > > > > > runtime-common module. > > > > > > > - Consistency: Utilizes a type-safe StoreType enum and > aligns > > > with > > > > > > > existing RealmContext patterns. > > > > > > > > > > > > > > The PR has been refined with feedback from @dimas-b and is now > > > ready > > > > > for > > > > > > > community review. I'd appreciate any feedback on the overall > > > > approach. > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > Subham Sangwan > > > > > > > GitHub: Subham-KRLX > > > > > > > > > > > > > > [1] https://github.com/apache/polaris/pull/3960 > > > > > > > [2] https://github.com/apache/polaris/issues/3890 > > > > > > > > > > > > > > > > > > > > > > > > > > > >
