[GitHub] [iceberg] rdblue commented on a change in pull request #2544: Doc: refactor Hive documentation with catalog loading examples

GitBox Sun, 02 May 2021 14:06:45 -0700


rdblue commented on a change in pull request #2544:
URL: https://github.com/apache/iceberg/pull/2544#discussion_r624755187




##########
File path: site/docs/hive.md
##########
@@ -17,117 +17,297 @@
 
 # Hive
 
-## Hive read support
-Iceberg supports the reading of Iceberg tables from 
[Hive](https://hive.apache.org) by using a 
[StorageHandler](https://cwiki.apache.org/confluence/display/Hive/StorageHandlers).
 Please note that only Hive 2.x versions are currently supported.
+Iceberg supports reading and writing Iceberg tables through 
[Hive](https://hive.apache.org) by using a 
[StorageHandler](https://cwiki.apache.org/confluence/display/Hive/StorageHandlers).
+Here is the current compatibility matrix for Iceberg Hive support: 
 
-### Table creation
-This section explains the various steps needed in order to overlay a Hive 
table "on top of" an existing Iceberg table. Iceberg tables are created using 
either a 
[`Catalog`](./javadoc/master/index.html?org/apache/iceberg/catalog/Catalog.html)
 or an implementation of the 
[`Tables`](./javadoc/master/index.html?org/apache/iceberg/Tables.html) 
interface and Hive needs to be configured accordingly to read data from these 
different types of table.
+| Feature                  | Hive 2.x               | Hive 3.1.2             |
+| ------------------------ | ---------------------- | ---------------------- |
+| CREATE EXTERNAL TABLE    | ✔️                     | ✔️                     |
+| CREATE TABLE             | ✔️                     | ✔️                     |
+| DROP TABLE               | ✔️                     | ✔️                     |
+| SELECT                   | ✔️ (MapReduce and Tez) | ✔️ (MapReduce and Tez) |
+| INSERT INTO              | ✔️ (MapReduce only)️    | ✔️ (MapReduce only)    |
 
-#### Add the Iceberg Hive Runtime jar file to the Hive classpath
-Regardless of the table type, the `HiveIcebergStorageHandler` and supporting 
classes need to be made available on Hive's classpath. These are provided by 
the `iceberg-hive-runtime` jar file. For example, if using the Hive shell, this 
can be achieved by issuing a statement like so:
-```sql
-add jar /path/to/iceberg-hive-runtime.jar;
-```
-There are many others ways to achieve this including adding the jar file to 
Hive's auxiliary classpath (so it is available by default) - please refer to 
Hive's documentation for more information.
+## Enabling Iceberg support in Hive
 
-#### Using Hadoop Tables
-Iceberg tables created using `HadoopTables` are stored entirely in a directory 
in a filesystem like HDFS.
+### Loading runtime jar
 
-##### Create an Iceberg table
-The first step is to create an Iceberg table using the Spark/Java/Python API 
and `HadoopTables`. For the purposes of this documentation we will assume that 
the table is called `table_a` and that the table location is 
`hdfs://some_path/table_a`.
+To enable Iceberg support in Hive, the `HiveIcebergStorageHandler` and 
supporting classes need to be made available on Hive's classpath. 
+These are provided by the `iceberg-hive-runtime` jar file. 
+For example, if using the Hive shell, this can be achieved by issuing a 
statement like so:
 
-##### Create a Hive table
-Now overlay a Hive table on top of this Iceberg table by issuing Hive DDL like 
so:
-```sql
-CREATE EXTERNAL TABLE table_a 
-STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
-LOCATION 'hdfs://some_bucket/some_path/table_a';
 ```
-
-#### Query the Iceberg table via Hive
-You should now be able to issue Hive SQL `SELECT` queries using the above 
table and see the results returned from the underlying Iceberg table.
-```sql
-SELECT * from table_a;
+add jar /path/to/iceberg-hive-runtime.jar;
 ```
 
-#### Using Hive Catalog
-Iceberg tables created using `HiveCatalog` are automatically registered with 
Hive.
+There are many others ways to achieve this including adding the jar file to 
Hive's auxiliary classpath so it is available by default.
+Please refer to Hive's documentation for more information.
 
-##### Create an Iceberg table
-The first step is to create an Iceberg table using the Spark/Java/Python API 
and `HiveCatalog`. For the purposes of this documentation we will assume that 
the table is called `table_b` and that the table location is 
`s3://some_path/table_b`. In order for Iceberg to correctly set up the Hive 
table for querying some configuration values need to be set, the two options 
for this are described below - you can use either or the other depending on 
your use case.
+### Enabling support
+
+#### Hadoop configuration
+
+The value `iceberg.engine.hive.enabled` needs to be set to `true` in the 
Hadoop configuration in the environment.

Review comment:
       I don't think that "needs to be set" and "Hadoop configuration" are 
clear. Why does it need to be set? What Hadoop Configuration?
   
   Using the context I added above, I think this should be:
   
   > To enable Hive support globally for an application, set 
`iceberg.engine.hive.enabled=true` in its Hadoop configuration. For example, 
setting this in the `hive-site.xml` loaded by Spark will enable the storage 
handler for all tables created by Spark.
   
   I wouldn't recommend setting it programmatically. If you're doing that, then 
you will very likely have a situation where not all environments are uniform so 
the storage handler is set and then unset by different applications.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #2544: Doc: refactor Hive documentation with catalog loading examples

Reply via email to