[GitHub] [iceberg] massdosage commented on a change in pull request #1837: Hive read path documentation for HadoopCatalog tables

GitBox Thu, 26 Nov 2020 09:03:12 -0800


massdosage commented on a change in pull request #1837:
URL: https://github.com/apache/iceberg/pull/1837#discussion_r531151067




##########
File path: site/docs/hive.md
##########
@@ -84,7 +84,32 @@ You should now be able to issue Hive SQL `SELECT` queries 
using the above table
 SELECT * from table_b;
 ```
 
+#### Using Hadoop Catalog
+Iceberg tables created using `HadoopCatalog` are stored entirely in a 
directory in a filesytem like HDFS. 
+
+##### Create an Iceberg table
+The first step is to create an Iceberg table using the Spark/Java/Python API 
and `HadoopCatalog`. For the purposes of this documentation we will assume that 
the table is called `database_a.table_c` and that the table location is 
`hdfs://some_path/database_a/table_c`.
+
+##### Create a Hive table
+Now overlay a Hive table on top of this Iceberg table by issuing Hive DDL like 
so:
+```sql
+CREATE EXTERNAL TABLE table_a 
+STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
+LOCATION 'hdfs://some_bucket/some_path/database_a/table_c';
+```
+
+#### Query the Iceberg table via Hive
+TODO: why does below work if no config settings are set in Hive but fails if 
we add `set iceberg.mr.catalog=hadoop` like the code suggests we need to do?
+
+You should now be able to issue Hive SQL `SELECT` queries using the above 
table and see the results returned from the underlying Iceberg table. Both the 
Map Reduce and Tez query execution engines are supported.
+```sql
+SELECT * from table_c;
+```
+
 ### Features
 
 #### Predicate pushdown
 Pushdown of the Hive SQL `WHERE` clause has been implemented so that these 
filters are used at the Iceberg TableScan level as well as by the Parquet and 
ORC Readers.
+
+#### Column Projection

Review comment:
       This isn't related to the HadoopCatalog but since 
https://github.com/apache/iceberg/pull/1417 was just merged I thought I might 
as well just put it here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] massdosage commented on a change in pull request #1837: Hive read path documentation for HadoopCatalog tables

Reply via email to