Hi all, After self-reviewing the PR again. I think we can make the Paimon and Polaris integration idempotent in further improvement.
The proposed flow is: 1. Check the Polaris metadata record first as an early return path. * If the table already exists in Polaris, return/load the table. 2. Check Paimon. * If the table already exists in Paimon, pass. * If the table does not exist in Paimon, create the namespace in Paimon if needed, then create the table in Paimon. 3. Register the table in Polaris. With this approach, even if step 2 succeeds but step 3 fails, we can return a detailed exception to the client and allow the client to retry. This should make table creation across both systems idempotent. If this makes sense, I can make this improvement in a follow-up PR. Thanks. Best regards, I-Ting Dmitri Bourlatchkov <[email protected]> 於 2026年5月22日週五 上午3:18寫道: > Hi All, > > I'm bumping this thread because PR [3820] was mention here before. > > This discussion is interesting and useful. Still, on the practical side, > how do you feel about merging [3820] now and working on Paimon-related > improvements in follow-up PRs? Any objections? > > [3820] https://github.com/apache/polaris/pull/3820 > > Thanks, > Dmitri. > > On Sat, Mar 14, 2026 at 3:34 AM 李宜頲 <[email protected]> wrote: > > > Hi all, > > > > We are adding support for Paimon inside Polaris's SparkCatalog. Before we > > add more formats, we would like to get community input on the intended > > architecture. > > > > This discussion originated from a code review conversation in PR #3820 > > <https://github.com/apache/polaris/pull/3820#discussion_r2865885791> > > > > > > > > *Current design* > > > > When SparkCatalog.loadTable is called, the routing works in three phases: > > > > > > 1. Try the Iceberg catalog (icebergSparkCatalog.loadTable). If it > succeeds, > > return immediately. > > > > 2. Call getTableFormat(ident), which makes a single HTTP GET to the > Polaris > > server to read the provider property stored in the generic table > metadata, > > without triggering any Spark DataSource resolution. > > > > 3. Route based on the provider string: > > > > - "paimon" : delegate to Paimon's SparkCatalog > > > > - unknown/other : fall back to polarisSparkCatalog.loadTable, which > > performs full DataSource resolution > > > > > > The same three-phase pattern is repeated independently in loadTable, > > alterTable, and dropTable*(But createTable is not following this > pattern)*. > > It might raise the concern that this makes the routing logic intrusive: > > every new format requires parallel changes across all three methods, and > > there is no single place that describes the full routing policy. > > > > > > *Questions for discussion* > > > > > > 1. Should Polaris determine the provider first (via metadata) and > delegate > > to a single matching catalog, or should it attempt multiple sub-catalogs > in > > a defined order? > > > > 2. If multiple sub-catalogs are supported, should there be a documented, > > deterministic > > > > resolution order (Iceberg -> Paimon -> Delta -> Hudi -> Polaris > > fallback)? Who owns that order, should it be configurable by operators? > > > > 3. Should the per-format routing logic be centralised behind an > abstraction > > (e.g. a SubCatalogRouter interface or a provider registry), so that > adding > > a new format is a single registration rather than edits across loadTable, > > alterTable, and dropTable? > > > > 4. Consistency:Should all table operations (loadTable, createTable, > > alterTable, dropTable, > > > > renameTable) follow the same routing strategy, or are per-operation > > differences acceptable? Currently createTable has a different branching > > structure from loadTable. > > > > 5. Is it in scope for Polaris to act as a routing layer for multiple > table > > providers, or should users who need both Polaris and Paimon configure > them > > as separate catalogs in their Spark session and route at the session > level > > themselves? > > > > > > We have a working Paimon implementation today and would like to avoid > > locking in a pattern that becomes hard to extend. Any input on the design > > direction, or pointers to prior discussion on this topic, would be much > > appreciated. > > > > > > Best regards, > > > > I-Ting > > >
