pvary commented on issue #5163:
URL: https://github.com/apache/iceberg/issues/5163#issuecomment-1188008043
> @pvary Thanks for your advice. Yes, I can manually create the hive
external table, and set the inputFormat to `HiveIcebergInputFormat` and other
table properties, hive can access the data. But it seems the `registerTable`
command (by seting `iceberg.engine.hive.enabled`)seems to be more convenient
and stable (it is same as manual creation essentially). I will create a PR
recently for a "force" option on `registerTable`.
You do not have to manually create a hive external table and set the table
properties.
You can just do the following:
```
SET iceberg.catalog.hadoop_cat.type=hadoop;
SET iceberg.catalog.hadoop_cat.warehouse=hdfs://example.com:8020/hadoop_cat;
CREATE EXTERNAL TABLE database_a.table_a
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
TBLPROPERTIES ('iceberg.catalog'='hadoop_cat');
```
Or of your table names are not matching you can do this:
```
CREATE EXTERNAL TABLE database_a.table_a
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
LOCATION 'hdfs://some_bucket/some_path/other_table_name'
TBLPROPERTIES ('iceberg.catalog'='location_based_table');
```
After this the `database_a.table_a` will be accessible for Hive, and it
could be queried as any normal Hive table.
I think the main misunderstanding here is that you do not have to use
HiveCatalog to query an Iceberg table. You can use whatever Catalog
implementation you want to use, and tell the Hive table to use that specific
Catalog implementation to access the data/metadata of the table.
Back to the original question:
- I think the external table approach is better if you want the table in the
target catalog to follow the changes of the table in the source catalog. If
there is a change committed to the original table, then the queries against the
Hive table will immediately reflect the changes.
- OTOH if you want to access a specific snapshot of the table then
`registerTable` could be useful. In this case you should make sure that the
original catalog will not expire / or remove files used by the tables
registered in the HiveCatalog. (I still would prefer to see a feature in Hive
for providing a specific snapshotId or metadata file location of a table in
table properties, instead of registering a table in HiveCatalog.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]