I also think this is a great idea, I just came from AWS re:Invent conference this week and I heard many people bring up this need. I think this is also aligned to the project's goals and intentions of bringing cross-table interoperability since the metastore or catalog is key to how you typically access open table formats. So whether you can sync the metadata to 1 or to multiple catalogs, this simple task is sorely needed. As more catalogs are starting to grow in the community: Unity Catalog, Apache Polaris (incubating), Apache Gravitino, DataHub, etc, the need for multiple catalogs may also grow.
Thanks, Kyle On Fri, Dec 6, 2024 at 12:16 AM Vinish Reddy <vin...@apache.org> wrote: > Hello Apache XTable (Incubating) Community, > > This is a discussion regarding a new feature request I have created in GH. > https://github.com/apache/incubator-xtable/issues/590 > > *Context* > Users of Apache XTable (Incubating) today can translate metadata across > table formats (iceberg, hudi, and delta) and use the tables in different > platforms depending on their choice. Today there's still some friction > involved in terms of usability because users need to explicitly register > the tables <https://xtable.apache.org/docs/catalogs-index/> in the catalog > of their choice (glue, HMS, unity, bigLake etc.) and then use the catalog > in the platform of their choice to do DDL, DML queries. > > XTable is built on the principle of omni directional interoperability and > I'm proposing an interface which allows syncing metadata of table formats > to multiple catalogs in a continuous and incremental manner. > > > *Why do we need this feature ?*1. Reduce friction for XTable users - XTable > sync will register the tables in the catalogs of their choice after > metadata generation. If users are using a single format, they can still use > XTable to sync the metadata across multiple catalogs. > 2. Avoid catalog lock-in - There's no reason why data/metadata in storage > should be registered in a single catalog, users can register the table > across multiple catalogs depending on the use-case, ecosystem and features > provided by the catalog. > > *Implementation* > I have submitted a PR with the interfaces for CatalogSyncClient and > CatalogSyncOperations, > https://github.com/apache/incubator-xtable/pull/591 > > Any inputs/feedback from the community who are interested in collaborating > on the design and implementation of this feature to respond to this email > or join the discussion directly on GitHub. Your input, whether in design > suggestions or implementation support, would be appreciated. > > Thanks > Vinish >