ahmedabu98 opened a new pull request, #35787:
URL: https://github.com/apache/beam/pull/35787

   Addresses #35637
   
   Refactors the Calcite Schema hierarchy in Beam SQL, introducing two new 
types: `CatalogManagerSchema` and `CatalogSchema`. This change addresses the 
limitation of the previous flat schema and enables true cross-database and 
cross-catalog operations.
   
   This builds upon the existing `BeamCalciteSchema` and improves user 
experience by allowing for more complex and flexible SQL queries. This is 
especially useful for users working with external metastores and tables that 
need to be referenced across different catalogs or databases.
   
   ## Enhanced Schema Hierarchy
   
   The following hierarchy is implemented (taken from 
https://s.apache.org/beam-catalogs)
   
   <div align="middle">
   <img height="400" alt="image" 
src="https://github.com/user-attachments/assets/f1f6444a-8af6-4aa6-aee6-bf24a9228872";
 />
   </div>
   
   Previously, all schemas were represented by the same `BeamCalciteSchema`, 
which made for a flat schema structure. This PR introduces a new hierarchy that 
more accurately reflects standard SQL organization:
   
   - `CatalogManagerSchema`: the root of the hierarchy
   - `CatalogSchema`: child nodes representing Catalogs
   - `BeamCalciteSchema` (existing): child nodes of `CatalogSchema` that 
represent Databases
   
   This new structure unlocks the ability to use SQL commands like `USE 
CATALOG`, `USE DATABASE`, and fully qualified table names, for example: 
`catalog.database.table`.
   
   ## Support for cross-catalog and cross-database queries
   This is the core benefit of this PR. Users can now perform operations that 
span multiple catalogs and databases, such as:
   
   ```sql
   INSERT INTO catalog_1.database_1.table_1 SELECT * FROM 
catalog_2.database_2.table_2;
   
   -- or
   USE CATALOG catalog_1;
   INSERT INTO database_1.table_1 SELECT * FROM catalog_2.database_2.table_2;
   
   -- or
   USE CATALOG catalog_1;
   USE DATABASE database_1;
   INSERT INTO table_1 SELECT * FROM catalog_2.database_2.table_2;
   
   -- or
   USE DATABASE catalog_1.database_1;
   INSERT INTO table_1 SELECT * FROM catalog_2.database_2.table_2;
   ```
   
   ## Improved usability for external tables
   
   Ease of use is significantly improved for external tables (like Iceberg). 
Previously, users had to manually register existing tables and databases with 
`CREATE EXTERNAL TABLE` or `CREATE DATABASE`. Now that we have abstractions for 
external metastore entities, the following commands are possible:
   
   - `INSERT INTO <table> ...` on an existing Table without prior registeration
   - `SELECT * FROM <table> ...` on an existing Table without prior 
registeration
   - `DROP TABLE <table> ...` on an existing Table without prior registeration
   - `USE DATABASE <database>` on an existing Database without prior 
registeration
   - `DROP DATABASE <database>` on an existing Database without prior 
registeration
   
   This is significant because it eliminates boilerplate code for users who 
manage external tables/databases using Beam SQL.
   
   ## Backwards compatibility
   To ensure a smooth transition, a "default" catalog with a "default" database 
is provided. For users who do not require cross-catalog or cross-database 
features, existing SQL commands will continue to function as before. For 
example, the command `CREATE EXTERNAL TABLE my_table ...` will create a table 
with path `default.default.my_table`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to