lcspinter opened a new pull request #2129: URL: https://github.com/apache/iceberg/pull/2129
The current catalog configuration is stored in the main hive configuration, by setting the `iceberg.mr.catalog`. This works perfectly when the user is working on a dataset when all the tables are coming from the same catalog. In case of operations involving multiple tables from different catalogs, this implementation fails to serve the need. This PR provides a solution for this, by implementing Spark-like catalog configuration. The catalog configuration is stored in the hive main configuration, the same way as it handled in Spark, and on table level, the name of the catalog and the table identifier is stored. If catalog name is not defined on the table, a default catalog is used. Here is an example of how to configure a Hadoop-catalog In the main hive configuration we store the following properties: - `iceberg.catalog.<catalog_name>.type' = hadoop - `iceberg.catalog.<catalog_name>.warehouse= somelocation On the table level we have the following properties: - `iceberg.mr.table.catalog' = <catalog_name> - `iceberg.mr.table.identifier` = <database.table_name> If property `iceberg.mr.table.catalog` is missing from the table, it starts looking for a catalogue definition with the name "default". If that is also missing, the original implementation is used, where the property`iceberg.mr.catalog` stores the catalog information. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
