massdosage commented on a change in pull request #1837:
URL: https://github.com/apache/iceberg/pull/1837#discussion_r531625984



##########
File path: site/docs/hive.md
##########
@@ -84,7 +84,32 @@ You should now be able to issue Hive SQL `SELECT` queries 
using the above table
 SELECT * from table_b;
 ```
 
+#### Using Hadoop Catalog
+Iceberg tables created using `HadoopCatalog` are stored entirely in a 
directory in a filesytem like HDFS. 
+
+##### Create an Iceberg table
+The first step is to create an Iceberg table using the Spark/Java/Python API 
and `HadoopCatalog`. For the purposes of this documentation we will assume that 
the table is called `database_a.table_c` and that the table location is 
`hdfs://some_path/database_a/table_c`.
+
+##### Create a Hive table
+Now overlay a Hive table on top of this Iceberg table by issuing Hive DDL like 
so:
+```sql
+CREATE EXTERNAL TABLE table_a 
+STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
+LOCATION 'hdfs://some_bucket/some_path/database_a/table_c';
+```
+
+#### Query the Iceberg table via Hive
+TODO: why does below work if no config settings are set in Hive but fails if 
we add `set iceberg.mr.catalog=hadoop` like the code suggests we need to do?

Review comment:
       @shardulm94 I agree with you, at the moment I'm just documenting how it 
works currently but the idea is for us all to take a step back from the 
implementation details and look at it through the lens of an end user and make 
changes. In our earlier implementation we did store the catalog type as a table 
property so it didn't have to be specified by the end user, I'm not sure how 
this got lost along the way. I'll test shortly whether this still works or not. 
I also think we should rename the property to not have `mr` in it. As @pvary 
says there is some discussion on the mailing list about this and that should 
lead to us agreeing to make some changes to how this implemented and then 
updating these docs accordingly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to