ahmedabu98 opened a new pull request, #35223:
URL: https://github.com/apache/beam/pull/35223

   ## Motivation
   
   Modern data architectures, particularly data lakes and lakehouses (e.g., 
Apache Iceberg), heavily rely on catalogs for centralized metadata management. 
Beam SQL currently lacks this concept, limiting its interoperability and ease 
of use within these ecosystems.
   
   While Beam SQL has a 
[MetaStore](https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/store/MetaStore.java)
 for managing table definitions and providers, it operates as a single, flat 
namespace. Users cannot create or configure multiple, distinct metastores or 
easily switch between them. This PR introduces the concept of a Catalog as a 
higher-level organizational unit that leverages existing MetaStore capabilities 
for table management.
   
   ## Usage
   
   This PR introduces the following DDL commands for managing catalogs in Beam 
SQL:
   
   ### Create a new catalog
   
   ```sql
   CREATE CATALOG my_catalog 
   TYPE 'local'
   PROPERTIES (
     'foo', 'bar',
     'abc', 'xyz'
   )
   ```
   
   ### Set the current catalog
   
   ```sql
   SET CATALOG my_catalog
   ```
   
   ### Drop a catalog
   
   ```sql
   DROP CATALOG my_catalog
   ```
   
   ## Changes
   This change preserves backwards compatibility. It introduces two new 
interfaces and their in-memory implementations:
   
   ### _1. Catalog_
   
   This generally represents the aforementioned catalog, and includes the 
following attributes:
   - name (`string`): Unique identifier for the catalog
   - type (`string`): The catalog's implementation (e.g. `local`, `iceberg`, 
etc.)
   - properties (`map<string, string>`): The catalog's configuration
   - 
([MetaStore](https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/store/MetaStore.java)):
 Manages the catalog's tables and providers
   
   Tables are now scoped within a specific Catalog rather than a global scope. 
For example, a Beam SQL Table created in Catalog A will not be available when 
switching to Catalog B. Catalog B would have to create a new reference for its 
scope.
   
   A default in-memory catalog named `'default'` is automatically initialized.
   
   
   ### _2. CatalogManager_
   
   This interface is effectively the new root schema in Beam SQL's Calcite 
integration. Previously, an 
[InMemoryMetaStore](https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/store/InMemoryMetaStore.java)
 served as the root. With CatalogManager as the top-level container, individual 
Catalog instances can be created and registered as sub-schemas. This allows 
CatalogManager to do the following:
   
   - create, managed, and drop catalogs
   - switch between different active catalogs
   - register global TableProviders that become available to all catalogs
   
   The PR includes implementations for these two interfaces:
   - `InMemoryCatalog` implements `Catalog`
   - `InMemoryCatalogManager` implements `CatalogManager`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to