[
https://issues.apache.org/jira/browse/HIVE-29507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Teddy Choi updated HIVE-29507:
------------------------------
Description:
h1. Issue
{{HiveIcebergStorageHandler}} was used for interoperability use cases between
Apache Hive 3.x and Apache Spark. The handler was packaged in
{{iceberg-hive-runtime.jar}} until Apache Iceberg 1.7. The Hive runtime is
deleted from Apache Iceberg 1.8. Apache Hive also provides
{{{}hive-iceberg-handler.jar{}}}. Apache Spark can import
{{hive-iceberg-handler.jar}} and {{iceberg-spark-runtime.jar}} together for
interoperability use cases.
However, there are differences between classes in {{iceberg-spark-runtime.jar}}
and {{hive-iceberg-handler.jar}} while there's no such between in
{{iceberg-spark-runtime.jar}} and {{{}iceberg-hive-runtime.jar{}}}. It causes
an {{{}InvalidClassException{}}}. For example,
{code:java}
java.io.InvalidClassException: org.apache.iceberg.BaseFile; local class
incompatible: stream classdesc serialVersionUID = 8569836863676564712, local
class serialVersionUID = -8072381884098305524{code}
h1. Fix
Create a slim hive-iceberg-handler core JAR file to avoid
{{{}InvalidClassException{}}}.
Before:
* iceberg-shading
** {{maven-shade-plugin}} shades Iceberg and other dependencies.
* iceberg-handler
** {{maven-dependency-plugin}} unpacks iceberg-shading and iceberg-catalog
then packs them together.
After:
* iceberg-shading
** {{maven-shade-plugin}} shades Iceberg and other dependencies.
* iceberg-handler
** {{maven-shade-plugin}} shades iceberg-shading and iceberg-catalog without
relocation, which results the same JAR file as {{maven-dependency-plugin}} did.
** {{maven-jar-plugin}} creates a new slim JAR without shaded classes.
{{maven-dependency-plugin}} in {{iceberg-handler}} overwrites the class
directory, so {{maven-jar-plugin}} is affected. Its solution is to use
{{{}<configuration><includes></includes></configuration>{}}}, but as there are
many shared Java packages across artifacts, almost 100 individual class names
should be explicitly configured. That number looks hard to maintain when any
class is changed in those packages.
was:
h1. Issue
{quote}Starting from 1.8.0 Iceberg doesn't release Hive runtime connector. For
Hive query engine integration (specifically with Hive 2.x and 3.x) use Hive
runtime connector coming with Iceberg 1.6.1, or use Hive 4.0.0 or later which
is released with embedded Iceberg integration.
[https://iceberg.apache.org/docs/latest/hive/#feature-support]
{quote}
Apache Spark uses {{{}iceberg-spark-runtime{}}}, which relied on
{{HiveIcebergStorageHandler}} from {{iceberg-mr}} before 1.8, but that module
is gone from 1.8. Therefore, a slim {{hive-iceberg-handler-core.jar}} file
without shading is required for Hive 3.x and Iceberg 1.8+. Apache Spark can
import {{hive-iceberg-handler.jar}} with {{iceberg-spark-runtime.jar}}
together. But there are some classes on both JAR files. It causes an
{{{}InvalidClassException{}}}. For example,
{code:java}
java.io.InvalidClassException: org.apache.iceberg.BaseFile; local class
incompatible: stream classdesc serialVersionUID = 8569836863676564712, local
class serialVersionUID = -8072381884098305524{code}
h1. Fix
Create a slim hive-iceberg-handler core JAR file to avoid
{{{}InvalidClassException{}}}.
Before:
* iceberg-shading
** {{maven-shade-plugin}} shades Iceberg and other dependencies.
* iceberg-handler
** {{maven-dependency-plugin}} unpacks iceberg-shading and iceberg-catalog
then packs them together.
After:
* iceberg-shading
** {{maven-shade-plugin}} shades Iceberg and other dependencies.
* iceberg-handler
** {{maven-shade-plugin}} shades iceberg-shading and iceberg-catalog without
relocation, which results the same JAR file as {{maven-dependency-plugin}} did.
** {{maven-jar-plugin}} creates a new slim JAR without shaded classes.
{{maven-dependency-plugin}} in {{iceberg-handler}} overwrites the class
directory, so {{maven-jar-plugin}} is affected. Its solution is to use
{{{}<configuration><includes></includes></configuration>{}}}, but as there are
many shared Java packages across artifacts, almost 100 individual class names
should be explicitly configured. That number looks hard to maintain when any
class is changed in those packages.
> Create a slim hive-iceberg-handler core JAR
> -------------------------------------------
>
> Key: HIVE-29507
> URL: https://issues.apache.org/jira/browse/HIVE-29507
> Project: Hive
> Issue Type: New Feature
> Components: Iceberg integration
> Reporter: Teddy Choi
> Assignee: Teddy Choi
> Priority: Major
> Labels: pull-request-available
>
> h1. Issue
> {{HiveIcebergStorageHandler}} was used for interoperability use cases between
> Apache Hive 3.x and Apache Spark. The handler was packaged in
> {{iceberg-hive-runtime.jar}} until Apache Iceberg 1.7. The Hive runtime is
> deleted from Apache Iceberg 1.8. Apache Hive also provides
> {{{}hive-iceberg-handler.jar{}}}. Apache Spark can import
> {{hive-iceberg-handler.jar}} and {{iceberg-spark-runtime.jar}} together for
> interoperability use cases.
> However, there are differences between classes in
> {{iceberg-spark-runtime.jar}} and {{hive-iceberg-handler.jar}} while there's
> no such between in {{iceberg-spark-runtime.jar}} and
> {{{}iceberg-hive-runtime.jar{}}}. It causes an {{{}InvalidClassException{}}}.
> For example,
> {code:java}
> java.io.InvalidClassException: org.apache.iceberg.BaseFile; local class
> incompatible: stream classdesc serialVersionUID = 8569836863676564712, local
> class serialVersionUID = -8072381884098305524{code}
> h1. Fix
> Create a slim hive-iceberg-handler core JAR file to avoid
> {{{}InvalidClassException{}}}.
> Before:
> * iceberg-shading
> ** {{maven-shade-plugin}} shades Iceberg and other dependencies.
> * iceberg-handler
> ** {{maven-dependency-plugin}} unpacks iceberg-shading and iceberg-catalog
> then packs them together.
> After:
> * iceberg-shading
> ** {{maven-shade-plugin}} shades Iceberg and other dependencies.
> * iceberg-handler
> ** {{maven-shade-plugin}} shades iceberg-shading and iceberg-catalog without
> relocation, which results the same JAR file as {{maven-dependency-plugin}}
> did.
> ** {{maven-jar-plugin}} creates a new slim JAR without shaded classes.
> {{maven-dependency-plugin}} in {{iceberg-handler}} overwrites the class
> directory, so {{maven-jar-plugin}} is affected. Its solution is to use
> {{{}<configuration><includes></includes></configuration>{}}}, but as there
> are many shared Java packages across artifacts, almost 100 individual class
> names should be explicitly configured. That number looks hard to maintain
> when any class is changed in those packages.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)