[GitHub] [iceberg] rdsr commented on a change in pull request #1748: Hive read via HiveCatalog documentation

GitBox Thu, 12 Nov 2020 09:56:10 -0800


rdsr commented on a change in pull request #1748:
URL: https://github.com/apache/iceberg/pull/1748#discussion_r522302920




##########
File path: site/docs/hive.md
##########
@@ -50,6 +50,38 @@ You should now be able to issue Hive SQL `SELECT` queries 
using the above table
 SELECT * from table_a;
 ```
 
+#### Using Hive Catalog
+Iceberg tables created using `HiveCatalog` are automatically registered with 
Hive.
+
+##### Create an Iceberg table
+The first step is to create an Iceberg table using the Spark/Java/Python API 
and `HiveCatalog`. For the purposes of this documentation we will assume that 
the table is called `table_b` and that the table location is 
`s3://some_path/table_b`. In order for Iceberg to correctly set up the Hive 
table for querying some configuration values need to be set, the two options 
for this are described below - you can use either or the other depending on 
your use case.
+
+##### Hive Configuration
+The value `iceberg.engine.hive.enabled` needs to be set to `true` and added to 
the Hive configuration file on the classpath of the application creating the 
table. This can be done by modifying the relevant `hive-site.xml`. 
Alternatively this can done programatically like so:

Review comment:
       I assume we can query existing Iceberg tables also through Hive by 
setting the `iceberg.engine.hive.enabled` flag? Do you think it worth calling 
that out?

##########
File path: site/docs/hive.md
##########
@@ -50,6 +50,38 @@ You should now be able to issue Hive SQL `SELECT` queries 
using the above table
 SELECT * from table_a;
 ```
 
+#### Using Hive Catalog
+Iceberg tables created using `HiveCatalog` are automatically registered with 
Hive.
+
+##### Create an Iceberg table
+The first step is to create an Iceberg table using the Spark/Java/Python API 
and `HiveCatalog`. For the purposes of this documentation we will assume that 
the table is called `table_b` and that the table location is 
`s3://some_path/table_b`. In order for Iceberg to correctly set up the Hive 
table for querying some configuration values need to be set, the two options 
for this are described below - you can use either or the other depending on 
your use case.
+
+##### Hive Configuration
+The value `iceberg.engine.hive.enabled` needs to be set to `true` and added to 
the Hive configuration file on the classpath of the application creating the 
table. This can be done by modifying the relevant `hive-site.xml`. 
Alternatively this can done programatically like so:
+```java
+Configuration hadoopConfiguration = spark.sparkContext().hadoopConfiguration();
+hadoopConfiguration.set(ConfigProperties.ENGINE_HIVE_ENABLED, "true"); 
//iceberg.engine.hive.enabled=true
+HiveCatalog catalog = new HiveCatalog(hadoopConfiguration);
+...
+catalog.createTable(tableId, schema, spec);
+```
+
+##### Table Property Configuration
+The property `engine.hive.enabled` needs to be set to `true` and added to the 
table properties when creating the Iceberg table. This can be done like so:
+```java
+    Map<String, String> tableProperties = new HashMap<String, String>();
+    tableProperties.put(TableProperties.ENGINE_HIVE_ENABLED, "true"); 
////engine.hive.enabled=true
+    catalog.createTable(tableId, schema, spec, tableProperties);
+```
+
+#### Query the Iceberg table via Hive
+TODO: tables created by the above can't just be read "as is", need to document 
steps needed in order to be able to query them here.

Review comment:
       This seems to me contradictory to the following lines. Is this TODO 
still valid?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdsr commented on a change in pull request #1748: Hive read via HiveCatalog documentation

Reply via email to