This is an automated email from the ASF dual-hosted git repository.
pvary pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/main by this push:
new 665baa5ac2 Doc: Remove Hive 2.x/3.x related docs in hive.md (#12700)
665baa5ac2 is described below
commit 665baa5ac23f7743642ff51d6b7594952e60198e
Author: jackylee <[email protected]>
AuthorDate: Wed Apr 9 15:29:24 2025 +0800
Doc: Remove Hive 2.x/3.x related docs in hive.md (#12700)
---
docs/docs/hive.md | 94 +++++++++------------------------------
site/docs/multi-engine-support.md | 8 ++--
2 files changed, 24 insertions(+), 78 deletions(-)
diff --git a/docs/docs/hive.md b/docs/docs/hive.md
index 5810feffaf..3abf8d89ce 100644
--- a/docs/docs/hive.md
+++ b/docs/docs/hive.md
@@ -24,41 +24,20 @@ Iceberg supports reading and writing Iceberg tables through
[Hive](https://hive.
a
[StorageHandler](https://cwiki.apache.org/confluence/display/Hive/StorageHandlers).
## Feature support
-The following features matrix illustrates the support for different features
across Hive releases for Iceberg tables -
-
-| Feature support | Hive 2 / 3
| Hive 4 |
-|-----------------------------------------------------------------|------------|--------|
-| [SQL create table](#create-table) | ✔️
| ✔️ |
-| [SQL create table as select (CTAS)](#create-table-as-select) | ✔️
| ✔️ |
-| [SQL create table like table (CTLT)](#create-table-like-table) | ✔️
| ✔️ |
-| [SQL drop table](#drop-table) | ✔️
| ✔️ |
-| [SQL insert into](#insert-into) | ✔️
| ✔️ |
-| [SQL insert overwrite](#insert-overwrite) | ✔️
| ✔️ |
-| [SQL delete from](#delete-from) |
| ✔️ |
-| [SQL update](#update) |
| ✔️ |
-| [SQL merge into](#merge-into) |
| ✔️ |
-| [Branches and tags](#branches-and-tags) |
| ✔️ |
-
-Iceberg compatibility with Hive 2.x and Hive 3.1.2/3 supports the following
features:
-
-* Creating a table
-* Dropping a table
-* Reading a table
-* Inserting into a table (INSERT INTO)
-!!! warning
- DML operations work only with MapReduce execution engine.
-
-Hive supports the following additional features with Hive version 4.0.0 and
above:
+Hive supports the following features with Hive version 4.0.0 and above:
-* Creating an Iceberg identity-partitioned table
-* Creating an Iceberg table with any partition spec, including the various
transforms supported by Iceberg
-* Creating a table from an existing table (CTAS table)
-* Altering a table while keeping Iceberg and Hive schemas in sync
-* Altering the partition schema (updating columns)
-* Altering the partition schema by specifying partition transforms
+* Creating an Iceberg table.
+* Creating an Iceberg identity-partitioned table.
+* Creating an Iceberg table with any partition spec, including the various
transforms supported by Iceberg.
+* Creating a table from an existing table (CTAS table).
+* Dropping a table.
+* Altering a table while keeping Iceberg and Hive schemas in sync.
+* Altering the partition schema (updating columns).
+* Altering the partition schema by specifying partition transforms.
* Truncating a table / partition, dropping a partition.
-* Migrating tables in Avro, Parquet, or ORC (Non-ACID) format to Iceberg
+* Migrating tables in Avro, Parquet, or ORC (Non-ACID) format to Iceberg.
+* Reading an Iceberg table.
* Reading the schema of a table.
* Querying Iceberg metadata tables.
* Time travel applications.
@@ -66,11 +45,11 @@ Hive supports the following additional features with Hive
version 4.0.0 and abov
* Inserting data overwriting existing data (INSERT OVERWRITE) in a table /
partition.
* Copy-on-write support for delete, update and merge queries, CRUD support for
Iceberg V1 tables.
* Altering a table with expiring snapshots.
-* Create a table like an existing table (CTLT table)
-* Support adding parquet compression type via Table properties [Compression
types](https://spark.apache.org/docs/2.4.3/sql-data-sources-parquet.html#configuration)
+* Create a table like an existing table (CTLT table).
+* Support adding parquet compression type via Table properties [Compression
types](https://spark.apache.org/docs/2.4.3/sql-data-sources-parquet.html#configuration).
* Altering a table metadata location.
* Supporting table rollback.
-* Honors sort orders on existing tables when writing a table [Sort orders
specification](../../spec.md#sort-orders)
+* Honors sort orders on existing tables when writing a table [Sort orders
specification](../../spec.md#sort-orders).
* Creating, writing to and dropping an Iceberg branch / tag.
* Allowing expire snapshots by Snapshot ID, by time range, by retention of
last N snapshots and using table properties.
* Set current snapshot using snapshot ID for an Iceberg table.
@@ -89,29 +68,14 @@ Hive supports the following additional features with Hive
version 4.0.0 and abov
## Enabling Iceberg support in Hive
-Hive 4 comes with `hive-iceberg` that ships Iceberg, so no additional
downloads or jars are needed. For older versions of Hive a runtime jar has to
be added.
+Starting from 1.8.0 Iceberg doesn't release Hive runtime connector. For Hive
query engine integration (specifically
+with Hive 2.x and 3.x) use Hive runtime connector coming with Iceberg 1.6.1,
or use Hive 4.0.0 or later
+which is released with embedded Iceberg integration.
### Hive 4.0.x
Hive 4.0.x comes with Iceberg 1.4.3 included.
-### Hive 2.3.x, Hive 3.1.x
-
-In order to use Hive 2.3.x or Hive 3.1.x, you must load the Iceberg-Hive
runtime jar and enable Iceberg support, either globally or for an individual
table using a table property.
-
-#### Loading runtime jar
-
-To enable Iceberg support in Hive, the `HiveIcebergStorageHandler` and
supporting classes need to be made available on
-Hive's classpath. These are provided by the `iceberg-hive-runtime` jar file.
For example, if using the Hive shell, this
-can be achieved by issuing a statement like so:
-
-```
-add jar /path/to/iceberg-hive-runtime.jar;
-```
-
-There are many others ways to achieve this including adding the jar file to
Hive's auxiliary classpath so it is
-available by default. Please refer to Hive's documentation for more
information.
-
#### Enabling support
If the Iceberg storage handler is not in Hive's classpath, then Hive cannot
load or update the metadata for an Iceberg
@@ -126,9 +90,6 @@ To enable Hive support globally for an application, set
`iceberg.engine.hive.ena
For example, setting this in the `hive-site.xml` loaded by Spark will enable
the storage handler for all tables created
by Spark.
-!!! danger
- Starting with Apache Iceberg `0.11.0`, when using Hive with Tez you also
have to disable vectorization (`hive.vectorized.execution.enabled=false`).
-
##### Table property configuration
Alternatively, the property `engine.hive.enabled` can be set to `true` and
added to the table properties when creating
@@ -143,19 +104,12 @@ Catalog catalog=...;
The table level configuration overrides the global Hadoop configuration.
-##### Hive on Tez configuration
-
-To use the Tez engine on Hive `3.1.2` or later, Tez needs to be upgraded to >=
`0.10.1` which contains a necessary fix
[TEZ-4248](https://issues.apache.org/jira/browse/TEZ-4248).
-
-To use the Tez engine on Hive `2.3.x`, you will need to manually build Tez
from the `branch-0.9` branch due to a
-backwards incompatibility issue with Tez `0.10.1`.
-
-In both cases, you will also need to set the following property in the
`tez-site.xml` configuration file:
`tez.mrreader.config.update.properties=hive.io.file.readcolumn.names,hive.io.file.readcolumn.ids`.
-
## Catalog Management
### Global Hive catalog
+HiveCatalog integration supports Hive 2.3.10 or 3.1.3 or later.
+
From the Hive engine's perspective, there is only one global data catalog that
is defined in the Hadoop configuration in
the runtime environment. In contrast, Iceberg supports multiple different data
catalog types such as Hive, Hadoop, AWS
Glue, or custom catalog implementations. Iceberg also allows loading a table
directly based on its path in the file
@@ -213,12 +167,6 @@ SET iceberg.catalog.glue.lock.table=myGlueLockTable;
## DDL Commands
-Not all the features below are supported with Hive 2.3.x and Hive 3.1.x.
Please refer to the
-[Feature support](#feature-support) paragraph for further details.
-
-One generally applicable difference is that Hive 4 provides the possibility to
use
-`STORED BY ICEBERG` instead of the old `STORED BY
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'`
-
### CREATE TABLE
#### Non partitioned tables
@@ -601,9 +549,7 @@ Here are the features highlights for Iceberg Hive read
support:
1. **Predicate pushdown**: Pushdown of the Hive SQL `WHERE` clause has been
implemented so that these filters are used at the Iceberg `TableScan` level as
well as by the Parquet and ORC Readers.
2. **Column projection**: Columns from the Hive SQL `SELECT` clause are
projected down to the Iceberg readers to reduce the number of columns read.
-3. **Hive query engines**:
- - With Hive 2.3.x, 3.1.x, both the MapReduce and Tez query execution
engines are supported.
- - With Hive 4.x, the Tez query execution engine is supported.
+3. **Hive query engines**: With Hive 4.x, the Tez query execution engine is
supported.
Some of the advanced / little used optimizations are not yet implemented for
Iceberg tables, so you should check your individual queries.
Also currently the statistics stored in the MetaStore are used for query
planning. This is something we are planning to improve in the future.
diff --git a/site/docs/multi-engine-support.md
b/site/docs/multi-engine-support.md
index e791be4226..be3bc02a7c 100644
--- a/site/docs/multi-engine-support.md
+++ b/site/docs/multi-engine-support.md
@@ -103,10 +103,10 @@ Users should continuously upgrade their Flink version to
stay up-to-date.
<!-- markdown-link-check-disable -->
-| Version | Recommended minor version | Lifecycle Stage | Initial
Iceberg Support | Latest Iceberg Support | Latest Runtime Jar |
-| -------------- | ------------------------- | ----------------- |
----------------------- | ---------------------- | ------------------ |
-| 2 | 2.3.8 | Deprecated |
0.8.0-incubating | 1.7.2 |
[iceberg-hive-runtime](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-hive-runtime/1.7.2/iceberg-hive-runtime-1.7.2.jar)
|
-| 3 | 3.1.2 | Deprecated | 0.10.0
| 1.7.2 |
[iceberg-hive-runtime](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-hive-runtime/1.7.2/iceberg-hive-runtime-1.7.2.jar)
|
+| Version | Recommended minor version | Lifecycle Stage | Initial
Iceberg Support | Latest Iceberg Support | Latest Runtime Jar
|
+| -------------- | ------------------------- | ----------------- |
-----------------------
|------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|
+| 2 | 2.3.8 | Deprecated |
0.8.0-incubating | 1.6.1 |
[iceberg-hive-runtime](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-hive-runtime/1.6.1/iceberg-hive-runtime-1.6.1.jar)
|
+| 3 | 3.1.2 | Deprecated | 0.10.0
| 1.6.1 |
[iceberg-hive-runtime](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-hive-runtime/1.6.1/iceberg-hive-runtime-1.6.1.jar)
|
<!-- markdown-link-check-enable -->