lcspinter opened a new pull request #2129:
URL: https://github.com/apache/iceberg/pull/2129


   The current catalog configuration is stored in the main hive configuration, 
by setting the `iceberg.mr.catalog`. This works perfectly when the user is 
working on a dataset when all the tables are coming from the same catalog. 
   In case of operations involving multiple tables from different catalogs, 
this implementation fails to serve the need. 
   
   This PR provides a solution for this, by implementing Spark-like catalog 
configuration. The catalog configuration is stored in the hive main 
configuration, the same way as it handled in Spark, and on table level, the 
name of the catalog and the table identifier is stored. If catalog name is not 
defined on the table, a default catalog is used.
   
   Here is an example of how to configure a Hadoop-catalog
   
   In the main hive configuration we store the following properties:
   - `iceberg.catalog.<catalog_name>.type' = hadoop
   - `iceberg.catalog.<catalog_name>.warehouse= somelocation
   On the table level we have the following properties:
   - `iceberg.mr.table.catalog' = <catalog_name>
   - `iceberg.mr.table.identifier` = <database.table_name>
   
   If property `iceberg.mr.table.catalog` is missing from the table, it starts 
looking for a catalogue definition with the name "default". If that is also 
missing, the original implementation is used, where the 
property`iceberg.mr.catalog` stores the catalog information. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to