[
https://issues.apache.org/jira/browse/HIVE-29507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Teddy Choi updated HIVE-29507:
------------------------------
Description:
h1. Issue
{quote}Starting from 1.8.0 Iceberg doesn't release Hive runtime connector. For
Hive query engine integration (specifically with Hive 2.x and 3.x) use Hive
runtime connector coming with Iceberg 1.6.1, or use Hive 4.0.0 or later which
is released with embedded Iceberg integration.
[https://iceberg.apache.org/docs/latest/hive/#feature-support]
{quote}
Apache Spark uses {{{}iceberg-spark-runtime{}}}, which relied on
{{HiveIcebergStorageHandler}} from {{iceberg-mr}} before 1.8, but that module
is gone from 1.8. Therefore, a slim {{hive-iceberg-handler-core.jar}} file
without shading is required for Hive 3.x and Iceberg 1.8+. Apache Spark can
import {{hive-iceberg-handler.jar}} with {{iceberg-spark-runtime.jar}}
together. But there are some classes on both JAR files. It causes an
{{{}InvalidClassException{}}}. For example,
{code:java}
java.io.InvalidClassException: org.apache.iceberg.BaseFile; local class
incompatible: stream classdesc serialVersionUID = 8569836863676564712, local
class serialVersionUID = -8072381884098305524{code}
h1. Fix
Create a slim hive-iceberg-handler core JAR file to avoid
{{{}InvalidClassException{}}}.
Before:
* iceberg-shading
** {{maven-shade-plugin}} shades Iceberg and other dependencies.
* iceberg-handler
** {{maven-dependency-plugin}} unpacks iceberg-shading and iceberg-catalog
then packs them together.
After:
* iceberg-shading
** {{maven-shade-plugin}} shades Iceberg and other dependencies.
* iceberg-handler
** {{maven-shade-plugin}} shades iceberg-shading and iceberg-catalog without
relocation, which results the same JAR file as {{maven-dependency-plugin}} did.
** {{maven-jar-plugin}} creates a new slim JAR without shaded classes.
{{maven-dependency-plugin}} in {{iceberg-handler}} overwrites the class
directory, so {{maven-jar-plugin}} is affected. Its solution is to use
{{{}<configuration><includes></includes></configuration>{}}}, but as there are
many shared Java packages across artifacts, almost 100 individual class names
should be explicitly configured. That number looks hard to maintain when any
class is changed in those packages.
was:
h1. Issue
{quote}Starting from 1.8.0 Iceberg doesn't release Hive runtime connector. For
Hive query engine integration (specifically with Hive 2.x and 3.x) use Hive
runtime connector coming with Iceberg 1.6.1, or use Hive 4.0.0 or later which
is released with embedded Iceberg integration.
[https://iceberg.apache.org/docs/latest/hive/#feature-support]
{quote}
For Hive 3.x and Iceberg 1.8+, a slim {{hive-iceberg-handler-core.jar}} file
without shading is required. Apache Spark can import
{{hive-iceberg-handler.jar}} with {{iceberg-spark-runtime.jar}} together. But
there are some classes on both JAR files. It causes an
{{{}InvalidClassException{}}}. For example,
{code:java}
java.io.InvalidClassException: org.apache.iceberg.BaseFile; local class
incompatible: stream classdesc serialVersionUID = 8569836863676564712, local
class serialVersionUID = -8072381884098305524{code}
h1. Fix
Create a slim hive-iceberg-handler core JAR file to avoid
{{{}InvalidClassException{}}}.
Before:
* iceberg-shading
** {{maven-shade-plugin}} shades Iceberg and other dependencies.
* iceberg-handler
** {{maven-dependency-plugin}} unpacks iceberg-shading and iceberg-catalog
then packs them together.
After:
* iceberg-shading
** {{maven-shade-plugin}} shades Iceberg and other dependencies.
* iceberg-handler
** {{maven-shade-plugin}} shades iceberg-shading and iceberg-catalog without
relocation, which results the same JAR file as {{maven-dependency-plugin}} did.
** {{maven-jar-plugin}} creates a new slim JAR without shaded classes.
{{maven-dependency-plugin}} in {{iceberg-handler}} overwrites the class
directory, so {{maven-jar-plugin}} is affected. Its solution is to use
{{{}<configuration><includes></includes></configuration>{}}}, but as there are
many shared Java packages across artifacts, almost 100 individual class names
should be explicitly configured. That number looks hard to maintain when any
class is changed in those packages.
> Create a slim hive-iceberg-handler core JAR
> -------------------------------------------
>
> Key: HIVE-29507
> URL: https://issues.apache.org/jira/browse/HIVE-29507
> Project: Hive
> Issue Type: New Feature
> Components: Iceberg integration
> Reporter: Teddy Choi
> Assignee: Teddy Choi
> Priority: Major
>
> h1. Issue
> {quote}Starting from 1.8.0 Iceberg doesn't release Hive runtime connector.
> For Hive query engine integration (specifically with Hive 2.x and 3.x) use
> Hive runtime connector coming with Iceberg 1.6.1, or use Hive 4.0.0 or later
> which is released with embedded Iceberg integration.
> [https://iceberg.apache.org/docs/latest/hive/#feature-support]
> {quote}
> Apache Spark uses {{{}iceberg-spark-runtime{}}}, which relied on
> {{HiveIcebergStorageHandler}} from {{iceberg-mr}} before 1.8, but that module
> is gone from 1.8. Therefore, a slim {{hive-iceberg-handler-core.jar}} file
> without shading is required for Hive 3.x and Iceberg 1.8+. Apache Spark can
> import {{hive-iceberg-handler.jar}} with {{iceberg-spark-runtime.jar}}
> together. But there are some classes on both JAR files. It causes an
> {{{}InvalidClassException{}}}. For example,
> {code:java}
> java.io.InvalidClassException: org.apache.iceberg.BaseFile; local class
> incompatible: stream classdesc serialVersionUID = 8569836863676564712, local
> class serialVersionUID = -8072381884098305524{code}
> h1. Fix
> Create a slim hive-iceberg-handler core JAR file to avoid
> {{{}InvalidClassException{}}}.
> Before:
> * iceberg-shading
> ** {{maven-shade-plugin}} shades Iceberg and other dependencies.
> * iceberg-handler
> ** {{maven-dependency-plugin}} unpacks iceberg-shading and iceberg-catalog
> then packs them together.
> After:
> * iceberg-shading
> ** {{maven-shade-plugin}} shades Iceberg and other dependencies.
> * iceberg-handler
> ** {{maven-shade-plugin}} shades iceberg-shading and iceberg-catalog without
> relocation, which results the same JAR file as {{maven-dependency-plugin}}
> did.
> ** {{maven-jar-plugin}} creates a new slim JAR without shaded classes.
> {{maven-dependency-plugin}} in {{iceberg-handler}} overwrites the class
> directory, so {{maven-jar-plugin}} is affected. Its solution is to use
> {{{}<configuration><includes></includes></configuration>{}}}, but as there
> are many shared Java packages across artifacts, almost 100 individual class
> names should be explicitly configured. That number looks hard to maintain
> when any class is changed in those packages.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)