[GitHub] [iceberg] pvary commented on issue #5163: Support catalog method to set table metadata

GitBox Mon, 18 Jul 2022 11:10:06 -0700


pvary commented on issue #5163:
URL: https://github.com/apache/iceberg/issues/5163#issuecomment-1188008043


   > @pvary Thanks for your advice. Yes, I can manually create the hive 
external table, and set the inputFormat to `HiveIcebergInputFormat` and other 
table properties, hive can access the data. But it seems the `registerTable` 
command (by seting `iceberg.engine.hive.enabled`)seems to be more convenient 
and stable (it is same as manual creation essentially). I will create a PR 
recently for a "force" option on `registerTable`.
   
   You do not have to manually create a hive external table and set the table 
properties.
   You can just do the following:
   ```
   SET iceberg.catalog.hadoop_cat.type=hadoop;
   SET iceberg.catalog.hadoop_cat.warehouse=hdfs://example.com:8020/hadoop_cat;
   
   CREATE EXTERNAL TABLE database_a.table_a
   STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
   TBLPROPERTIES ('iceberg.catalog'='hadoop_cat');
   ```
   
   Or of your table names are not matching you can do this:
   ```
   CREATE EXTERNAL TABLE database_a.table_a
   STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
   LOCATION 'hdfs://some_bucket/some_path/other_table_name'
   TBLPROPERTIES ('iceberg.catalog'='location_based_table');
   ```
   
   After this the `database_a.table_a` will be accessible for Hive, and it 
could be queried as any normal Hive table.
   
   I think the main misunderstanding here is that you do not have to use 
HiveCatalog to query an Iceberg table. You can use whatever Catalog 
implementation you want to use, and tell the Hive table to use that specific 
Catalog implementation to access the data/metadata of the table.
   
   Back to the original question:
   - I think the external table approach is better if you want the table in the 
target catalog to follow the changes of the table in the source catalog. If 
there is a change committed to the original table, then the queries against the 
Hive table will immediately reflect the changes.
   - OTOH if you want to access a specific snapshot of the table then 
`registerTable` could be useful. In this case you should make sure that the 
original catalog will not expire / or remove files used by the tables 
registered in the HiveCatalog. (I still would prefer to see a feature in Hive 
for providing a specific snapshotId or metadata file location of a table in 
table properties, instead of registering a table in HiveCatalog.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] pvary commented on issue #5163: Support catalog method to set table metadata

Reply via email to