Re: [DISCUSS] Generic table delegation strategy in Polaris SparkCatalog

Dmitri Bourlatchkov Mon, 25 May 2026 15:52:47 -0700

Hi I-Ting,

What do you mean by "table already exists in Paimon"?


Do you mean a Generic Table in Polaris terminology?

Thanks,
Dmitri.

On Sat, May 23, 2026 at 12:15 PM ITing Lee <[email protected]> wrote:

> Hi all,
>
> After self-reviewing the PR again. I think we can make the Paimon and
> Polaris integration idempotent in further improvement.
>
> The proposed flow is:
>
> 1. Check the Polaris metadata record first as an early return path.
>    * If the table already exists in Polaris, return/load the table.
>
> 2. Check Paimon.
>    * If the table already exists in Paimon, pass.
>    * If the table does not exist in Paimon, create the namespace in Paimon
> if needed, then create the table in Paimon.
>
> 3. Register the table in Polaris.
>
> With this approach, even if step 2 succeeds but step 3 fails, we can return
> a detailed exception to the client and allow the client to retry. This
> should make table creation across both systems idempotent.
>
> If this makes sense, I can make this improvement in a follow-up PR.
> Thanks.
>
> Best regards,
> I-Ting
>
> Dmitri Bourlatchkov <[email protected]> 於 2026年5月22日週五 上午3:18寫道：
>
> > Hi All,
> >
> > I'm bumping this thread because PR [3820] was mention here before.
> >
> > This discussion is interesting and useful. Still, on the practical side,
> > how do you feel about merging [3820] now and working on Paimon-related
> > improvements in follow-up PRs? Any objections?
> >
> > [3820] https://github.com/apache/polaris/pull/3820
> >
> > Thanks,
> > Dmitri.
> >
> > On Sat, Mar 14, 2026 at 3:34 AM 李宜頲 <[email protected]> wrote:
> >
> > > Hi all,
> > >
> > > We are adding support for Paimon inside Polaris's SparkCatalog. Before
> we
> > > add more formats, we would like to get community input on the intended
> > > architecture.
> > >
> > > This discussion originated from a code review conversation in PR #3820
> > > <https://github.com/apache/polaris/pull/3820#discussion_r2865885791>
> > >
> > >
> > >
> > > *Current design*
> > >
> > > When SparkCatalog.loadTable is called, the routing works in three
> phases:
> > >
> > >
> > > 1. Try the Iceberg catalog (icebergSparkCatalog.loadTable). If it
> > succeeds,
> > > return immediately.
> > >
> > > 2. Call getTableFormat(ident), which makes a single HTTP GET to the
> > Polaris
> > > server to read the provider property stored in the generic table
> > metadata,
> > > without triggering any Spark DataSource resolution.
> > >
> > > 3. Route based on the provider string:
> > >
> > >     - "paimon"  : delegate to Paimon's SparkCatalog
> > >
> > >     - unknown/other : fall back to polarisSparkCatalog.loadTable, which
> > > performs full DataSource resolution
> > >
> > >
> > > The same three-phase pattern is repeated independently in loadTable,
> > > alterTable, and dropTable*（But createTable is not following this
> > pattern)*.
> > > It might raise the concern that this makes the routing logic intrusive:
> > > every new format requires parallel changes across all three methods,
> and
> > > there is no single place that describes the full routing policy.
> > >
> > >
> > > *Questions for discussion*
> > >
> > >
> > > 1. Should Polaris determine the provider first (via metadata) and
> > delegate
> > > to a single matching catalog, or should it attempt multiple
> sub-catalogs
> > in
> > > a defined order?
> > >
> > > 2. If multiple sub-catalogs are supported, should there be a
> documented,
> > > deterministic
> > >
> > >   resolution order (Iceberg -> Paimon -> Delta -> Hudi -> Polaris
> > > fallback)? Who owns that order, should it be configurable by operators?
> > >
> > > 3. Should the per-format routing logic be centralised behind an
> > abstraction
> > > (e.g. a SubCatalogRouter interface or a provider registry), so that
> > adding
> > > a new format is a single registration rather than edits across
> loadTable,
> > > alterTable, and dropTable?
> > >
> > > 4. Consistency：Should all table operations (loadTable, createTable,
> > > alterTable, dropTable,
> > >
> > >   renameTable) follow the same routing strategy, or are per-operation
> > > differences acceptable? Currently createTable has a different branching
> > > structure from loadTable.
> > >
> > > 5. Is it in scope for Polaris to act as a routing layer for multiple
> > table
> > > providers, or should users who need both Polaris and Paimon configure
> > them
> > > as separate catalogs in their Spark session and route at the session
> > level
> > > themselves?
> > >
> > >
> > > We have a working Paimon implementation today and would like to avoid
> > > locking in a pattern that becomes hard to extend. Any input on the
> design
> > > direction, or pointers to prior discussion on this topic, would be much
> > > appreciated.
> > >
> > >
> > > Best regards,
> > >
> > > I-Ting
> > >
> >
>


-- 
Dmitri Bourlatchkov
Senior Staff Software Engineer, Dremio
Dremio.com
<https://www.dremio.com/?utm_medium=email&utm_source=signature&utm_term=na&utm_content=email-signature&utm_campaign=email-signature>
/
Follow Us on LinkedIn <https://www.linkedin.com/company/dremio> / Get
Started <https://www.dremio.com/get-started/>


The Agentic Lakehouse
The only lakehouse built for agents, managed by agents

Re: [DISCUSS] Generic table delegation strategy in Polaris SparkCatalog

Reply via email to