Hi Subham,

Thanks for working on this. Given the complexity and long term implications
discussed in https://github.com/apache/polaris/issues/3890, I think a short
design doc could still be helpful to capture the intended architecture and
future evolution. Here are a few questions listed in the issue. I believe
these should be answered before jumping to an implementation.


   1. Should we split each potential noisy table into its own dedicated
   data source. For example, one data source for events, one for metrics, and
   one for idempotency.
   2. Should we allow flexible grouping. For example, events and
   idempotency tables sharing one data source, while metrics uses another.
   3. Should we consider different DS per realm instead of table-level
   spliting?
   4. How should schema version information be managed. If tables live in
   different data sources, how do we track and coordinate schema evolution.
   5. Should different data sources be allowed to point to different
   schemas or databases. This likely aligns with the isolation goal, but it
   implies that cross table joins become difficult or impossible at the
   database level, leaving only in memory joins as an option.
   6. Should different data sources be allowed to point to the same schema.
   If not, we need validation logic to detect and prevent misconfiguration.


Yufei


On Tue, Mar 10, 2026 at 7:33 AM Dmitri Bourlatchkov <[email protected]>
wrote:

> Hi Subham,
>
> Thanks again for your contribution!
>
> I believe PR 3960 moves in the right direction by establishing an SPI to
> delegate DataSource resolution logic to the runtime environment.
>
> It immediately allows custom implementations in downstream projects (if
> people wish to do that) and opens a way for supporting multiple DataSources
> in Apache Polaris (in follow-up PRs),
>
> I think the PR is pretty clear in itself and does not require any extra
> design docs. Let's review it in GH and merge when we have consensus.
>
> Cheers,
> Dmitri.
>
> On Tue, Mar 10, 2026 at 8:27 AM Subham Sangwan <
> [email protected]>
> wrote:
>
> > Hi Polaris Dev Team I have opened PR #3960 [1] to introduce the
> > foundational groundwork for multi-datasource support in JDBC persistence,
> > addressing Issue #3890 [2].The goal is to enable physical isolation of
> > different persistence workloads (METASTORE, METRICS, EVENTS) into
> dedicated
> > connection pools or databases. This will allow Polaris to better handle
> > high-traffic environments by preventing "noisy neighbor" effects on the
> > core entity tables.
> >
> > Key Highlights:
> >
> >    - DataSourceResolver: A new pluggable interface for routing JDBC
> >    connections based on RealmContext and StoreType.
> >    - Modular Design: Decoupled the resolution implementation into the
> >    runtime-common module.
> >    - Consistency: Utilizes a type-safe StoreType enum and aligns with
> >    existing RealmContext patterns.
> >
> > The PR has been refined with feedback from @dimas-b and is now ready for
> > community review. I'd appreciate any feedback on the overall approach.
> >
> > Best regards,
> >
> > Subham Sangwan
> > GitHub: Subham-KRLX
> >
> > [1] https://github.com/apache/polaris/pull/3960
> > [2] https://github.com/apache/polaris/issues/3890
> >
>

Reply via email to