[ 
https://issues.apache.org/jira/browse/HIVE-29507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-29507:
------------------------------
    Description: 
h1. Issue
{quote}Starting from 1.8.0 Iceberg doesn't release Hive runtime connector. For 
Hive query engine integration (specifically with Hive 2.x and 3.x) use Hive 
runtime connector coming with Iceberg 1.6.1, or use Hive 4.0.0 or later which 
is released with embedded Iceberg integration.

[https://iceberg.apache.org/docs/latest/hive/#feature-support]
{quote}
Apache Spark uses {{{}iceberg-spark-runtime{}}}, which relied on 
{{HiveIcebergStorageHandler}} from {{iceberg-mr}} before 1.8, but that module 
is gone from 1.8. Therefore, a slim {{hive-iceberg-handler-core.jar}} file 
without shading is required for Hive 3.x and Iceberg 1.8+. Apache Spark can 
import {{hive-iceberg-handler.jar}} with {{iceberg-spark-runtime.jar}} 
together. But there are some classes on both JAR files. It causes an 
{{{}InvalidClassException{}}}. For example,
{code:java}
java.io.InvalidClassException: org.apache.iceberg.BaseFile; local class 
incompatible: stream classdesc serialVersionUID = 8569836863676564712, local 
class serialVersionUID = -8072381884098305524{code}
h1. Fix

Create a slim hive-iceberg-handler core JAR file to avoid 
{{{}InvalidClassException{}}}.

Before:
 * iceberg-shading
 ** {{maven-shade-plugin}} shades Iceberg and other dependencies.
 * iceberg-handler
 ** {{maven-dependency-plugin}} unpacks iceberg-shading and iceberg-catalog 
then packs them together.

After:
 * iceberg-shading
 ** {{maven-shade-plugin}} shades Iceberg and other dependencies.
 * iceberg-handler
 ** {{maven-shade-plugin}} shades iceberg-shading and iceberg-catalog without 
relocation, which results the same JAR file as {{maven-dependency-plugin}} did.
 ** {{maven-jar-plugin}} creates a new slim JAR without shaded classes.

{{maven-dependency-plugin}} in {{iceberg-handler}} overwrites the class 
directory, so {{maven-jar-plugin}} is affected. Its solution is to use 
{{{}<configuration><includes></includes></configuration>{}}}, but as there are 
many shared Java packages across artifacts, almost 100 individual class names 
should be explicitly configured. That number looks hard to maintain when any 
class is changed in those packages.

  was:
h1. Issue
{quote}Starting from 1.8.0 Iceberg doesn't release Hive runtime connector. For 
Hive query engine integration (specifically with Hive 2.x and 3.x) use Hive 
runtime connector coming with Iceberg 1.6.1, or use Hive 4.0.0 or later which 
is released with embedded Iceberg integration.

[https://iceberg.apache.org/docs/latest/hive/#feature-support]
{quote}
For Hive 3.x and Iceberg 1.8+, a slim {{hive-iceberg-handler-core.jar}} file 
without shading is required. Apache Spark can import 
{{hive-iceberg-handler.jar}} with {{iceberg-spark-runtime.jar}} together. But 
there are some classes on both JAR files. It causes an 
{{{}InvalidClassException{}}}. For example,
{code:java}
java.io.InvalidClassException: org.apache.iceberg.BaseFile; local class 
incompatible: stream classdesc serialVersionUID = 8569836863676564712, local 
class serialVersionUID = -8072381884098305524{code}
h1. Fix

Create a slim hive-iceberg-handler core JAR file to avoid 
{{{}InvalidClassException{}}}.

Before:
 * iceberg-shading
 ** {{maven-shade-plugin}} shades Iceberg and other dependencies.
 * iceberg-handler
 ** {{maven-dependency-plugin}} unpacks iceberg-shading and iceberg-catalog 
then packs them together.

After:
 * iceberg-shading
 ** {{maven-shade-plugin}} shades Iceberg and other dependencies.
 * iceberg-handler
 ** {{maven-shade-plugin}} shades iceberg-shading and iceberg-catalog without 
relocation, which results the same JAR file as {{maven-dependency-plugin}} did.
 ** {{maven-jar-plugin}} creates a new slim JAR without shaded classes.

{{maven-dependency-plugin}} in {{iceberg-handler}} overwrites the class 
directory, so {{maven-jar-plugin}} is affected. Its solution is to use 
{{{}<configuration><includes></includes></configuration>{}}}, but as there are 
many shared Java packages across artifacts, almost 100 individual class names 
should be explicitly configured. That number looks hard to maintain when any 
class is changed in those packages.


> Create a slim hive-iceberg-handler core JAR
> -------------------------------------------
>
>                 Key: HIVE-29507
>                 URL: https://issues.apache.org/jira/browse/HIVE-29507
>             Project: Hive
>          Issue Type: New Feature
>          Components: Iceberg integration
>            Reporter: Teddy Choi
>            Assignee: Teddy Choi
>            Priority: Major
>
> h1. Issue
> {quote}Starting from 1.8.0 Iceberg doesn't release Hive runtime connector. 
> For Hive query engine integration (specifically with Hive 2.x and 3.x) use 
> Hive runtime connector coming with Iceberg 1.6.1, or use Hive 4.0.0 or later 
> which is released with embedded Iceberg integration.
> [https://iceberg.apache.org/docs/latest/hive/#feature-support]
> {quote}
> Apache Spark uses {{{}iceberg-spark-runtime{}}}, which relied on 
> {{HiveIcebergStorageHandler}} from {{iceberg-mr}} before 1.8, but that module 
> is gone from 1.8. Therefore, a slim {{hive-iceberg-handler-core.jar}} file 
> without shading is required for Hive 3.x and Iceberg 1.8+. Apache Spark can 
> import {{hive-iceberg-handler.jar}} with {{iceberg-spark-runtime.jar}} 
> together. But there are some classes on both JAR files. It causes an 
> {{{}InvalidClassException{}}}. For example,
> {code:java}
> java.io.InvalidClassException: org.apache.iceberg.BaseFile; local class 
> incompatible: stream classdesc serialVersionUID = 8569836863676564712, local 
> class serialVersionUID = -8072381884098305524{code}
> h1. Fix
> Create a slim hive-iceberg-handler core JAR file to avoid 
> {{{}InvalidClassException{}}}.
> Before:
>  * iceberg-shading
>  ** {{maven-shade-plugin}} shades Iceberg and other dependencies.
>  * iceberg-handler
>  ** {{maven-dependency-plugin}} unpacks iceberg-shading and iceberg-catalog 
> then packs them together.
> After:
>  * iceberg-shading
>  ** {{maven-shade-plugin}} shades Iceberg and other dependencies.
>  * iceberg-handler
>  ** {{maven-shade-plugin}} shades iceberg-shading and iceberg-catalog without 
> relocation, which results the same JAR file as {{maven-dependency-plugin}} 
> did.
>  ** {{maven-jar-plugin}} creates a new slim JAR without shaded classes.
> {{maven-dependency-plugin}} in {{iceberg-handler}} overwrites the class 
> directory, so {{maven-jar-plugin}} is affected. Its solution is to use 
> {{{}<configuration><includes></includes></configuration>{}}}, but as there 
> are many shared Java packages across artifacts, almost 100 individual class 
> names should be explicitly configured. That number looks hard to maintain 
> when any class is changed in those packages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to