Hi Subham, Dmitri, Yufei, JB, I’m generally aligned with the direction here, and I left a few more detailed comments on the PR. At a high level, my main concern is that the current proposal may still be a bit ahead of the current story/scope. I’d lean toward keeping the first step narrower, preferring purpose-based routing over per-realm routing for v1, and making the supported model/config+migration story more explicit before the broader contract hardens.
-ej On Wed, Mar 11, 2026 at 12:37 PM Jean-Baptiste Onofré <[email protected]> wrote: > Hi Subham, > > Thanks for this contribution. It's an interesting feature. > > As mentioned in the GitHub issues, I am fine with moving forward with a PR > as long as it remains a Draft PR to help drive the discussion. I suggest > linking a GitHub Discussion or a Design Doc within the PR to help build > consensus. > > That said, I have a few initial comments: > > 1. I like the SPI approach used in the PR. This should become a standard in > Polaris to facilitate custom implementations. > 2. I agree that having a data source "per purpose" is a good idea. The main > question is how we should handle the split: > - Very granular (per entity) > - By table "meaning" > - By realm (this may not be granular enough) > > From a user standpoint, I believe we should keep it simple. It would be a > first great step forward. For example, in the configuration (e.g., > application.properties), it could look like: > - polaris.datasource.entities= > - polaris.datasource.events= > - polaris.datasource.grants= > > Regards, > JB > > On Tue, Mar 10, 2026 at 11:19 PM Yufei Gu <[email protected]> wrote: > > > Hi Subham, > > > > Thanks for working on this. Given the complexity and long term > implications > > discussed in https://github.com/apache/polaris/issues/3890, I think a > > short > > design doc could still be helpful to capture the intended architecture > and > > future evolution. Here are a few questions listed in the issue. I believe > > these should be answered before jumping to an implementation. > > > > > > 1. Should we split each potential noisy table into its own dedicated > > data source. For example, one data source for events, one for metrics, > > and > > one for idempotency. > > 2. Should we allow flexible grouping. For example, events and > > idempotency tables sharing one data source, while metrics uses > another. > > 3. Should we consider different DS per realm instead of table-level > > spliting? > > 4. How should schema version information be managed. If tables live in > > different data sources, how do we track and coordinate schema > evolution. > > 5. Should different data sources be allowed to point to different > > schemas or databases. This likely aligns with the isolation goal, but > it > > implies that cross table joins become difficult or impossible at the > > database level, leaving only in memory joins as an option. > > 6. Should different data sources be allowed to point to the same > schema. > > If not, we need validation logic to detect and prevent > misconfiguration. > > > > > > Yufei > > > > > > On Tue, Mar 10, 2026 at 7:33 AM Dmitri Bourlatchkov <[email protected]> > > wrote: > > > > > Hi Subham, > > > > > > Thanks again for your contribution! > > > > > > I believe PR 3960 moves in the right direction by establishing an SPI > to > > > delegate DataSource resolution logic to the runtime environment. > > > > > > It immediately allows custom implementations in downstream projects (if > > > people wish to do that) and opens a way for supporting multiple > > DataSources > > > in Apache Polaris (in follow-up PRs), > > > > > > I think the PR is pretty clear in itself and does not require any extra > > > design docs. Let's review it in GH and merge when we have consensus. > > > > > > Cheers, > > > Dmitri. > > > > > > On Tue, Mar 10, 2026 at 8:27 AM Subham Sangwan < > > > [email protected]> > > > wrote: > > > > > > > Hi Polaris Dev Team I have opened PR #3960 [1] to introduce the > > > > foundational groundwork for multi-datasource support in JDBC > > persistence, > > > > addressing Issue #3890 [2].The goal is to enable physical isolation > of > > > > different persistence workloads (METASTORE, METRICS, EVENTS) into > > > dedicated > > > > connection pools or databases. This will allow Polaris to better > handle > > > > high-traffic environments by preventing "noisy neighbor" effects on > the > > > > core entity tables. > > > > > > > > Key Highlights: > > > > > > > > - DataSourceResolver: A new pluggable interface for routing JDBC > > > > connections based on RealmContext and StoreType. > > > > - Modular Design: Decoupled the resolution implementation into the > > > > runtime-common module. > > > > - Consistency: Utilizes a type-safe StoreType enum and aligns with > > > > existing RealmContext patterns. > > > > > > > > The PR has been refined with feedback from @dimas-b and is now ready > > for > > > > community review. I'd appreciate any feedback on the overall > approach. > > > > > > > > Best regards, > > > > > > > > Subham Sangwan > > > > GitHub: Subham-KRLX > > > > > > > > [1] https://github.com/apache/polaris/pull/3960 > > > > [2] https://github.com/apache/polaris/issues/3890 > > > > > > > > > >
