Hi EJ,

I'm afraid I do not really understand your concerns.

Do you have specific code changes in mind? It might be easier to proceed by
comparing specific PRs.

Re: "built-in contract", from my POV is not well-defined now.

There was some discussion about formalizing API / SPI boundaries, but it
has not progressed yet. Unfortunately, I could not easily find an email
thread for this. It certainly makes sense to continue that discussion, but
I do not think it should be a blocker for in-progress PRs.

Currently, I believe the best expression of downstream user expectations
about code is given by the Polaris Evolution doc [1]

[1] https://polaris.apache.org/in-dev/unreleased/evolution/

Cheers,
Dmitri.

On Thu, Mar 12, 2026 at 4:28 PM EJ Wang <[email protected]>
wrote:

> Thanks Dmitri! I agree the SPI itself is not the main concern here.
>
> IIUC, the question is less whether some realm-awareness exists somewhere in
> the implementation, and more what Polaris wants to make part of the initial
> built-in contract. My concern is that once the default story starts looking
> realm-driven, we may be implicitly hardening a broader model than we
> actually want to support yet.
>
> I’m still biased toward keeping the v1 built-in path narrower:
> - config-driven,
> - a small fixed set of purpose/store buckets,
> - and routing behavior that stays stable at the transaction / unit-of-work
> boundary.
>
> That still leaves room for a pluggable resolver SPI and downstream
> customization, but keeps OSS Polaris itself from committing too early to a
> more dynamic routing model before the config / migration / consistency
> story is fully settled. So I think the main thing I’d want to make more
> explicit is the boundary between:
> 1. extensibility the SPI permits, and
> 2. the built-in routing model Polaris is actually recommending and prepared
> to support in v1.
>
> If we can make that boundary crisp, I’m much less worried about the
> abstraction itself.
>
> -ej
>
> On Thu, Mar 12, 2026 at 11:04 AM Dmitri Bourlatchkov <[email protected]>
> wrote:
>
> > Hi EJ,
> >
> > In the current state of PR [3960], the DataSourceResolver interface
> enables
> > customizing per-realm DataSource resolution.
> >
> > However, Apache Polaris code is not complicated with any new per-realm
> > logic related to DataSources. The OSS side keeps working as before.
> >
> > I'm not sure whether Sunham intended to delegate this logic to custom
> > implementations or not, but as the PR stands the code looks pretty
> > reasonable to me.
> >
> > In the current state of the codebase, I do not think we can completely
> > avoid dealing with realms as they relate to DataSources. The decision
> about
> > how to cross-link them has to be made somewhere. IMHO, the
> > proposed DefaultDataSourceResolver looks like a good place to make that
> > decision. It is certainly subject to further evolution if we need to make
> > adjustments later.
> >
> > Before [3960] all realms implicitly received the default DataSource. This
> > is now explicit in the code.
> >
> > [3960] https://github.com/apache/polaris/pull/3960
> >
> > Cheers,
> > Dmitri.
> >
> > On Thu, Mar 12, 2026 at 2:29 AM EJ Wang <[email protected]>
> > wrote:
> >
> > > Hi Subham, Dmitri, Yufei, JB,
> > >
> > > I’m generally aligned with the direction here, and I left a few more
> > > detailed comments on the PR.
> > > At a high level, my main concern is that the current proposal may still
> > be
> > > a bit ahead of the current story/scope. I’d lean toward keeping the
> first
> > > step narrower, preferring purpose-based routing over per-realm routing
> > for
> > > v1, and making the supported model/config+migration story more explicit
> > > before the broader contract hardens.
> > >
> > > -ej
> > >
> > > On Wed, Mar 11, 2026 at 12:37 PM Jean-Baptiste Onofré <[email protected]
> >
> > > wrote:
> > >
> > > > Hi Subham,
> > > >
> > > > Thanks for this contribution. It's an interesting feature.
> > > >
> > > > As mentioned in the GitHub issues, I am fine with moving forward
> with a
> > > PR
> > > > as long as it remains a Draft PR to help drive the discussion. I
> > suggest
> > > > linking a GitHub Discussion or a Design Doc within the PR to help
> build
> > > > consensus.
> > > >
> > > > That said, I have a few initial comments:
> > > >
> > > > 1. I like the SPI approach used in the PR. This should become a
> > standard
> > > in
> > > > Polaris to facilitate custom implementations.
> > > > 2. I agree that having a data source "per purpose" is a good idea.
> The
> > > main
> > > > question is how we should handle the split:
> > > > - Very granular (per entity)
> > > > - By table "meaning"
> > > > - By realm (this may not be granular enough)
> > > >
> > > > From a user standpoint, I believe we should keep it simple. It would
> > be a
> > > > first great step forward. For example, in the configuration (e.g.,
> > > > application.properties), it could look like:
> > > > - polaris.datasource.entities=
> > > > - polaris.datasource.events=
> > > > - polaris.datasource.grants=
> > > >
> > > > Regards,
> > > > JB
> > > >
> > > > On Tue, Mar 10, 2026 at 11:19 PM Yufei Gu <[email protected]>
> > wrote:
> > > >
> > > > > Hi Subham,
> > > > >
> > > > > Thanks for working on this. Given the complexity and long term
> > > > implications
> > > > > discussed in https://github.com/apache/polaris/issues/3890, I
> think
> > a
> > > > > short
> > > > > design doc could still be helpful to capture the intended
> > architecture
> > > > and
> > > > > future evolution. Here are a few questions listed in the issue. I
> > > believe
> > > > > these should be answered before jumping to an implementation.
> > > > >
> > > > >
> > > > >    1. Should we split each potential noisy table into its own
> > dedicated
> > > > >    data source. For example, one data source for events, one for
> > > metrics,
> > > > > and
> > > > >    one for idempotency.
> > > > >    2. Should we allow flexible grouping. For example, events and
> > > > >    idempotency tables sharing one data source, while metrics uses
> > > > another.
> > > > >    3. Should we consider different DS per realm instead of
> > table-level
> > > > >    spliting?
> > > > >    4. How should schema version information be managed. If tables
> > live
> > > in
> > > > >    different data sources, how do we track and coordinate schema
> > > > evolution.
> > > > >    5. Should different data sources be allowed to point to
> different
> > > > >    schemas or databases. This likely aligns with the isolation
> goal,
> > > but
> > > > it
> > > > >    implies that cross table joins become difficult or impossible at
> > the
> > > > >    database level, leaving only in memory joins as an option.
> > > > >    6. Should different data sources be allowed to point to the same
> > > > schema.
> > > > >    If not, we need validation logic to detect and prevent
> > > > misconfiguration.
> > > > >
> > > > >
> > > > > Yufei
> > > > >
> > > > >
> > > > > On Tue, Mar 10, 2026 at 7:33 AM Dmitri Bourlatchkov <
> > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Hi Subham,
> > > > > >
> > > > > > Thanks again for your contribution!
> > > > > >
> > > > > > I believe PR 3960 moves in the right direction by establishing an
> > SPI
> > > > to
> > > > > > delegate DataSource resolution logic to the runtime environment.
> > > > > >
> > > > > > It immediately allows custom implementations in downstream
> projects
> > > (if
> > > > > > people wish to do that) and opens a way for supporting multiple
> > > > > DataSources
> > > > > > in Apache Polaris (in follow-up PRs),
> > > > > >
> > > > > > I think the PR is pretty clear in itself and does not require any
> > > extra
> > > > > > design docs. Let's review it in GH and merge when we have
> > consensus.
> > > > > >
> > > > > > Cheers,
> > > > > > Dmitri.
> > > > > >
> > > > > > On Tue, Mar 10, 2026 at 8:27 AM Subham Sangwan <
> > > > > > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Polaris Dev Team I have opened PR #3960 [1] to introduce the
> > > > > > > foundational groundwork for multi-datasource support in JDBC
> > > > > persistence,
> > > > > > > addressing Issue #3890 [2].The goal is to enable physical
> > isolation
> > > > of
> > > > > > > different persistence workloads (METASTORE, METRICS, EVENTS)
> into
> > > > > > dedicated
> > > > > > > connection pools or databases. This will allow Polaris to
> better
> > > > handle
> > > > > > > high-traffic environments by preventing "noisy neighbor"
> effects
> > on
> > > > the
> > > > > > > core entity tables.
> > > > > > >
> > > > > > > Key Highlights:
> > > > > > >
> > > > > > >    - DataSourceResolver: A new pluggable interface for routing
> > JDBC
> > > > > > >    connections based on RealmContext and StoreType.
> > > > > > >    - Modular Design: Decoupled the resolution implementation
> into
> > > the
> > > > > > >    runtime-common module.
> > > > > > >    - Consistency: Utilizes a type-safe StoreType enum and
> aligns
> > > with
> > > > > > >    existing RealmContext patterns.
> > > > > > >
> > > > > > > The PR has been refined with feedback from @dimas-b and is now
> > > ready
> > > > > for
> > > > > > > community review. I'd appreciate any feedback on the overall
> > > > approach.
> > > > > > >
> > > > > > > Best regards,
> > > > > > >
> > > > > > > Subham Sangwan
> > > > > > > GitHub: Subham-KRLX
> > > > > > >
> > > > > > > [1] https://github.com/apache/polaris/pull/3960
> > > > > > > [2] https://github.com/apache/polaris/issues/3890
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to