I also think this is a great idea, I just came from AWS re:Invent
conference this week and I heard many people bring up this need. I think
this is also aligned to the project's goals and intentions of bringing
cross-table interoperability since the metastore or catalog is key to how
you typically access open table formats. So whether you can sync the
metadata to 1 or to multiple catalogs, this simple task is sorely needed.
As more catalogs are starting to grow in the community: Unity Catalog,
Apache Polaris (incubating), Apache Gravitino, DataHub, etc, the need for
multiple catalogs may also grow.

Thanks,
Kyle

On Fri, Dec 6, 2024 at 12:16 AM Vinish Reddy <vin...@apache.org> wrote:

> Hello Apache XTable (Incubating) Community,
>
> This is a discussion regarding a new feature request I have created in GH.
> https://github.com/apache/incubator-xtable/issues/590
>
> *Context*
> Users of Apache XTable (Incubating) today can translate metadata across
> table formats (iceberg, hudi, and delta) and use the tables in different
> platforms depending on their choice. Today there's still some friction
> involved in terms of usability because users need to explicitly register
> the tables <https://xtable.apache.org/docs/catalogs-index/> in the catalog
> of their choice (glue, HMS, unity, bigLake etc.) and then use the catalog
> in the platform of their choice to do DDL, DML queries.
>
> XTable is built on the principle of omni directional interoperability and
> I'm proposing an interface which allows syncing metadata of table formats
> to multiple catalogs in a continuous and incremental manner.
>
>
> *Why do we need this feature ?*1. Reduce friction for XTable users - XTable
> sync will register the tables in the catalogs of their choice after
> metadata generation. If users are using a single format, they can still use
> XTable to sync the metadata across multiple catalogs.
> 2. Avoid catalog lock-in - There's no reason why data/metadata in storage
> should be registered in a single catalog, users can register the table
> across multiple catalogs depending on the use-case, ecosystem and features
> provided by the catalog.
>
> *Implementation*
> I have submitted a PR with the interfaces for CatalogSyncClient and
> CatalogSyncOperations,
> https://github.com/apache/incubator-xtable/pull/591
>
> Any inputs/feedback from the community who are interested in collaborating
> on the design and implementation of this feature to respond to this email
> or join the discussion directly on GitHub. Your input, whether in design
> suggestions or implementation support, would be appreciated.
>
> Thanks
> Vinish
>

Reply via email to