rdsr commented on a change in pull request #1748:
URL: https://github.com/apache/iceberg/pull/1748#discussion_r522302920
##########
File path: site/docs/hive.md
##########
@@ -50,6 +50,38 @@ You should now be able to issue Hive SQL `SELECT` queries
using the above table
SELECT * from table_a;
```
+#### Using Hive Catalog
+Iceberg tables created using `HiveCatalog` are automatically registered with
Hive.
+
+##### Create an Iceberg table
+The first step is to create an Iceberg table using the Spark/Java/Python API
and `HiveCatalog`. For the purposes of this documentation we will assume that
the table is called `table_b` and that the table location is
`s3://some_path/table_b`. In order for Iceberg to correctly set up the Hive
table for querying some configuration values need to be set, the two options
for this are described below - you can use either or the other depending on
your use case.
+
+##### Hive Configuration
+The value `iceberg.engine.hive.enabled` needs to be set to `true` and added to
the Hive configuration file on the classpath of the application creating the
table. This can be done by modifying the relevant `hive-site.xml`.
Alternatively this can done programatically like so:
Review comment:
I assume we can query existing Iceberg tables also through Hive by
setting the `iceberg.engine.hive.enabled` flag? Do you think it worth calling
that out?
##########
File path: site/docs/hive.md
##########
@@ -50,6 +50,38 @@ You should now be able to issue Hive SQL `SELECT` queries
using the above table
SELECT * from table_a;
```
+#### Using Hive Catalog
+Iceberg tables created using `HiveCatalog` are automatically registered with
Hive.
+
+##### Create an Iceberg table
+The first step is to create an Iceberg table using the Spark/Java/Python API
and `HiveCatalog`. For the purposes of this documentation we will assume that
the table is called `table_b` and that the table location is
`s3://some_path/table_b`. In order for Iceberg to correctly set up the Hive
table for querying some configuration values need to be set, the two options
for this are described below - you can use either or the other depending on
your use case.
+
+##### Hive Configuration
+The value `iceberg.engine.hive.enabled` needs to be set to `true` and added to
the Hive configuration file on the classpath of the application creating the
table. This can be done by modifying the relevant `hive-site.xml`.
Alternatively this can done programatically like so:
+```java
+Configuration hadoopConfiguration = spark.sparkContext().hadoopConfiguration();
+hadoopConfiguration.set(ConfigProperties.ENGINE_HIVE_ENABLED, "true");
//iceberg.engine.hive.enabled=true
+HiveCatalog catalog = new HiveCatalog(hadoopConfiguration);
+...
+catalog.createTable(tableId, schema, spec);
+```
+
+##### Table Property Configuration
+The property `engine.hive.enabled` needs to be set to `true` and added to the
table properties when creating the Iceberg table. This can be done like so:
+```java
+ Map<String, String> tableProperties = new HashMap<String, String>();
+ tableProperties.put(TableProperties.ENGINE_HIVE_ENABLED, "true");
////engine.hive.enabled=true
+ catalog.createTable(tableId, schema, spec, tableProperties);
+```
+
+#### Query the Iceberg table via Hive
+TODO: tables created by the above can't just be read "as is", need to document
steps needed in order to be able to query them here.
Review comment:
This seems to me contradictory to the following lines. Is this TODO
still valid?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]