Hi I-Ting, What do you mean by "table already exists in Paimon"?
Do you mean a Generic Table in Polaris terminology? Thanks, Dmitri. On Sat, May 23, 2026 at 12:15 PM ITing Lee <[email protected]> wrote: > Hi all, > > After self-reviewing the PR again. I think we can make the Paimon and > Polaris integration idempotent in further improvement. > > The proposed flow is: > > 1. Check the Polaris metadata record first as an early return path. > * If the table already exists in Polaris, return/load the table. > > 2. Check Paimon. > * If the table already exists in Paimon, pass. > * If the table does not exist in Paimon, create the namespace in Paimon > if needed, then create the table in Paimon. > > 3. Register the table in Polaris. > > With this approach, even if step 2 succeeds but step 3 fails, we can return > a detailed exception to the client and allow the client to retry. This > should make table creation across both systems idempotent. > > If this makes sense, I can make this improvement in a follow-up PR. > Thanks. > > Best regards, > I-Ting > > Dmitri Bourlatchkov <[email protected]> 於 2026年5月22日週五 上午3:18寫道: > > > Hi All, > > > > I'm bumping this thread because PR [3820] was mention here before. > > > > This discussion is interesting and useful. Still, on the practical side, > > how do you feel about merging [3820] now and working on Paimon-related > > improvements in follow-up PRs? Any objections? > > > > [3820] https://github.com/apache/polaris/pull/3820 > > > > Thanks, > > Dmitri. > > > > On Sat, Mar 14, 2026 at 3:34 AM 李宜頲 <[email protected]> wrote: > > > > > Hi all, > > > > > > We are adding support for Paimon inside Polaris's SparkCatalog. Before > we > > > add more formats, we would like to get community input on the intended > > > architecture. > > > > > > This discussion originated from a code review conversation in PR #3820 > > > <https://github.com/apache/polaris/pull/3820#discussion_r2865885791> > > > > > > > > > > > > *Current design* > > > > > > When SparkCatalog.loadTable is called, the routing works in three > phases: > > > > > > > > > 1. Try the Iceberg catalog (icebergSparkCatalog.loadTable). If it > > succeeds, > > > return immediately. > > > > > > 2. Call getTableFormat(ident), which makes a single HTTP GET to the > > Polaris > > > server to read the provider property stored in the generic table > > metadata, > > > without triggering any Spark DataSource resolution. > > > > > > 3. Route based on the provider string: > > > > > > - "paimon" : delegate to Paimon's SparkCatalog > > > > > > - unknown/other : fall back to polarisSparkCatalog.loadTable, which > > > performs full DataSource resolution > > > > > > > > > The same three-phase pattern is repeated independently in loadTable, > > > alterTable, and dropTable*(But createTable is not following this > > pattern)*. > > > It might raise the concern that this makes the routing logic intrusive: > > > every new format requires parallel changes across all three methods, > and > > > there is no single place that describes the full routing policy. > > > > > > > > > *Questions for discussion* > > > > > > > > > 1. Should Polaris determine the provider first (via metadata) and > > delegate > > > to a single matching catalog, or should it attempt multiple > sub-catalogs > > in > > > a defined order? > > > > > > 2. If multiple sub-catalogs are supported, should there be a > documented, > > > deterministic > > > > > > resolution order (Iceberg -> Paimon -> Delta -> Hudi -> Polaris > > > fallback)? Who owns that order, should it be configurable by operators? > > > > > > 3. Should the per-format routing logic be centralised behind an > > abstraction > > > (e.g. a SubCatalogRouter interface or a provider registry), so that > > adding > > > a new format is a single registration rather than edits across > loadTable, > > > alterTable, and dropTable? > > > > > > 4. Consistency:Should all table operations (loadTable, createTable, > > > alterTable, dropTable, > > > > > > renameTable) follow the same routing strategy, or are per-operation > > > differences acceptable? Currently createTable has a different branching > > > structure from loadTable. > > > > > > 5. Is it in scope for Polaris to act as a routing layer for multiple > > table > > > providers, or should users who need both Polaris and Paimon configure > > them > > > as separate catalogs in their Spark session and route at the session > > level > > > themselves? > > > > > > > > > We have a working Paimon implementation today and would like to avoid > > > locking in a pattern that becomes hard to extend. Any input on the > design > > > direction, or pointers to prior discussion on this topic, would be much > > > appreciated. > > > > > > > > > Best regards, > > > > > > I-Ting > > > > > > -- Dmitri Bourlatchkov Senior Staff Software Engineer, Dremio Dremio.com <https://www.dremio.com/?utm_medium=email&utm_source=signature&utm_term=na&utm_content=email-signature&utm_campaign=email-signature> / Follow Us on LinkedIn <https://www.linkedin.com/company/dremio> / Get Started <https://www.dremio.com/get-started/> The Agentic Lakehouse The only lakehouse built for agents, managed by agents
