eshu opened a new issue, #5719:
URL: https://github.com/apache/hudi/issues/5719
I'm trying to build a fat JAR with Hudi bundle and Spark 3.1 (AWS Glue
version) support with Scala 2.12
All issues does not exist in Hudi 0.10.1 and earlier versions.
1. Dependencies:
> [error] Modules were resolved with conflicting cross-version suffixes in
ProjectRef(uri("file:/Users/shu/workspace/daas-glue-core/"), "root"):
> [error] org.json4s:json4s-ast _2.12, _2.11
> [error] org.json4s:json4s-jackson _2.12, _2.11
> [error] org.json4s:json4s-core _2.12, _2.11
> [error] org.json4s:json4s-scalap _2.12, _2.11
Why do I have dependency for Scala 2.12 and 2.11?
Workaround: I added exclusion rule:
```
("org.apache.hudi" %% "hudi-spark3" %
HudiVersion).excludeAll(ExclusionRule("org.json4s", "json4s-jackson_2.11"))
```
There is also dependency to `hudi-spark-common_2.11`, you can check
https://mvnrepository.com/artifact/org.apache.hudi/hudi-spark3_2.12/0.11.0
**Why there are dependencies to Scala 2.11?**
2. Multiple sources found for hudi (org.apache.hudi.Spark2DefaultSource,
org.apache.hudi.Spark3DefaultSource)
When trying to write the empty dataset:
```
[info] - should write the empty dataset *** FAILED ***
[info] org.apache.spark.sql.AnalysisException: Multiple sources found for
hudi (org.apache.hudi.Spark2DefaultSource,
org.apache.hudi.Spark3DefaultSource), please specify the fully qualified class
name.
[info] at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:720)
[info] at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:746)
[info] at
org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:993)
[info] at
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:311)
[info] at
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
```
Workaround: I created the package `org.apache.hudi` the class
```
package org.apache.hudi
class Spark2DefaultSource extends DefaultSource {
override def shortName(): String = "hudi-spark2"
}
```
This class overrides Hudi implementation and I can discard it in merge rules.
**Why there are two conflicting definitions?**
3. Spark 3.1 is not supported
The same test as in the previous example:
```
[info] - should write the empty dataset *** FAILED ***
[info] java.lang.ClassNotFoundException:
org.apache.spark.sql.adapter.Spark3_1Adapter
[info] at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
[info] at
sbt.internal.ManagedClassLoader.findClass(ManagedClassLoader.java:102)
[info] at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
[info] at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
[info] at
org.apache.hudi.SparkAdapterSupport.sparkAdapter(SparkAdapterSupport.scala:37)
[info] at
org.apache.hudi.SparkAdapterSupport.sparkAdapter$(SparkAdapterSupport.scala:29)
[info] at
org.apache.hudi.HoodieSparkUtils$.sparkAdapter$lzycompute(HoodieSparkUtils.scala:46)
[info] at
org.apache.hudi.HoodieSparkUtils$.sparkAdapter(HoodieSparkUtils.scala:46)
[info] at
org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:150)
[info] at
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:241)
```
I did not find a good workaround for this issue. Class Spark3_1Adapter does
not exist, I found only Spark3_2Adapter, but there are many references to Spark
3.2 in code.
**Is support of Spark 3.1 dropped?**
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]