ahmedabu98 opened a new pull request, #35787: URL: https://github.com/apache/beam/pull/35787
Addresses #35637 Refactors the Calcite Schema hierarchy in Beam SQL, introducing two new types: `CatalogManagerSchema` and `CatalogSchema`. This change addresses the limitation of the previous flat schema and enables true cross-database and cross-catalog operations. This builds upon the existing `BeamCalciteSchema` and improves user experience by allowing for more complex and flexible SQL queries. This is especially useful for users working with external metastores and tables that need to be referenced across different catalogs or databases. ## Enhanced Schema Hierarchy The following hierarchy is implemented (taken from https://s.apache.org/beam-catalogs) <div align="middle"> <img height="400" alt="image" src="https://github.com/user-attachments/assets/f1f6444a-8af6-4aa6-aee6-bf24a9228872" /> </div> Previously, all schemas were represented by the same `BeamCalciteSchema`, which made for a flat schema structure. This PR introduces a new hierarchy that more accurately reflects standard SQL organization: - `CatalogManagerSchema`: the root of the hierarchy - `CatalogSchema`: child nodes representing Catalogs - `BeamCalciteSchema` (existing): child nodes of `CatalogSchema` that represent Databases This new structure unlocks the ability to use SQL commands like `USE CATALOG`, `USE DATABASE`, and fully qualified table names, for example: `catalog.database.table`. ## Support for cross-catalog and cross-database queries This is the core benefit of this PR. Users can now perform operations that span multiple catalogs and databases, such as: ```sql INSERT INTO catalog_1.database_1.table_1 SELECT * FROM catalog_2.database_2.table_2; -- or USE CATALOG catalog_1; INSERT INTO database_1.table_1 SELECT * FROM catalog_2.database_2.table_2; -- or USE CATALOG catalog_1; USE DATABASE database_1; INSERT INTO table_1 SELECT * FROM catalog_2.database_2.table_2; -- or USE DATABASE catalog_1.database_1; INSERT INTO table_1 SELECT * FROM catalog_2.database_2.table_2; ``` ## Improved usability for external tables Ease of use is significantly improved for external tables (like Iceberg). Previously, users had to manually register existing tables and databases with `CREATE EXTERNAL TABLE` or `CREATE DATABASE`. Now that we have abstractions for external metastore entities, the following commands are possible: - `INSERT INTO <table> ...` on an existing Table without prior registeration - `SELECT * FROM <table> ...` on an existing Table without prior registeration - `DROP TABLE <table> ...` on an existing Table without prior registeration - `USE DATABASE <database>` on an existing Database without prior registeration - `DROP DATABASE <database>` on an existing Database without prior registeration This is significant because it eliminates boilerplate code for users who manage external tables/databases using Beam SQL. ## Backwards compatibility To ensure a smooth transition, a "default" catalog with a "default" database is provided. For users who do not require cross-catalog or cross-database features, existing SQL commands will continue to function as before. For example, the command `CREATE EXTERNAL TABLE my_table ...` will create a table with path `default.default.my_table`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org