pan3793 commented on issue #4793:
URL: https://github.com/apache/hudi/issues/4793#issuecomment-1038016578


   @nsivabalan thanks for your reply.
   
   > may I know which hudi bundle or artifact you are using?
   
   We use the vanilla jars instead of the bundle jar because of
   
   - Hudi bundle jar name contains the exactly Spark patched version, e.g. 
`hudi-spark3.1.2-bundle*`, if we choose it, what if we want to upgrade Spark 
version to 3.1.3(voting phase), do we need to wait/ask Hudi community to 
publish the `hudi-spark3.1.3-bundle*` jar?
   
   - Hudi bundle jar contains lots of classes from transitive dependencies 
**WITHOUT** relocation, which makes a high risk of class conflict if the user 
also provides the original jars, e.g. `kotlin`, `curator`.
    
   I think Hudi has room to improve the bundle jar to reduce dependency 
maintenance effort for users/downstream projects. Compared to other data lake 
formats, delta restricts to involve dependencies other than spark, the 
[delta-core](https://mvnrepository.com/artifact/io.delta/delta-core_2.12/1.1.0) 
has only one transitive dependency `jackson-core-asl` which is not included in 
spark runtime jars. Iceberg provides `runtime` jar which is something like Hudi 
bundle jars but has such differences:
   1. The iceberg runtime jar does not contain classes that already exist in 
spark runtime libraries, e.g. `curator`
   2. The iceberg runtime jar relocates nearly every class other than 
`org.apache.iceberg` package to avoid potential class conflict with user 
classes.
   3. The iceberg provides runtime jars for each supported spark minor version, 
e.g. 
[`iceberg-spark-runtime-0.13.0.jar`](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime/0.13.0/iceberg-spark-runtime-0.13.0.jar)
 for spark 2.4.x, 
[`iceberg-spark3-runtime-0.13.0.jar`](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark3-runtime/0.13.0/iceberg-spark3-runtime-0.13.0.jar)
 from spark 3.0.x, 
[iceberg-spark-runtime-3.1_2.12-0.13.0.jar](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.1_2.12/0.13.0/iceberg-spark-runtime-3.1_2.12-0.13.0.jar)
 for spark 3.1.x, 
[iceberg-spark-runtime-3.2_2.12-0.13.0.jar](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.2_2.12/0.13.0/iceberg-spark-runtime-3.2_2.12-0.13.0.jar)
 for spark 3.2.x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to