ahmedabu98 opened a new pull request, #35223: URL: https://github.com/apache/beam/pull/35223
## Motivation Modern data architectures, particularly data lakes and lakehouses (e.g., Apache Iceberg), heavily rely on catalogs for centralized metadata management. Beam SQL currently lacks this concept, limiting its interoperability and ease of use within these ecosystems. While Beam SQL has a [MetaStore](https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/store/MetaStore.java) for managing table definitions and providers, it operates as a single, flat namespace. Users cannot create or configure multiple, distinct metastores or easily switch between them. This PR introduces the concept of a Catalog as a higher-level organizational unit that leverages existing MetaStore capabilities for table management. ## Usage This PR introduces the following DDL commands for managing catalogs in Beam SQL: ### Create a new catalog ```sql CREATE CATALOG my_catalog TYPE 'local' PROPERTIES ( 'foo', 'bar', 'abc', 'xyz' ) ``` ### Set the current catalog ```sql SET CATALOG my_catalog ``` ### Drop a catalog ```sql DROP CATALOG my_catalog ``` ## Changes This change preserves backwards compatibility. It introduces two new interfaces and their in-memory implementations: ### _1. Catalog_ This generally represents the aforementioned catalog, and includes the following attributes: - name (`string`): Unique identifier for the catalog - type (`string`): The catalog's implementation (e.g. `local`, `iceberg`, etc.) - properties (`map<string, string>`): The catalog's configuration - ([MetaStore](https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/store/MetaStore.java)): Manages the catalog's tables and providers Tables are now scoped within a specific Catalog rather than a global scope. For example, a Beam SQL Table created in Catalog A will not be available when switching to Catalog B. Catalog B would have to create a new reference for its scope. A default in-memory catalog named `'default'` is automatically initialized. ### _2. CatalogManager_ This interface is effectively the new root schema in Beam SQL's Calcite integration. Previously, an [InMemoryMetaStore](https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/store/InMemoryMetaStore.java) served as the root. With CatalogManager as the top-level container, individual Catalog instances can be created and registered as sub-schemas. This allows CatalogManager to do the following: - create, managed, and drop catalogs - switch between different active catalogs - register global TableProviders that become available to all catalogs The PR includes implementations for these two interfaces: - `InMemoryCatalog` implements `Catalog` - `InMemoryCatalogManager` implements `CatalogManager` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
