[GitHub] [hudi] eshu opened a new issue, #5719: [SUPPORT] Hudi 0.11.0 for Spark 3.1 build issues

GitBox Mon, 30 May 2022 17:21:37 -0700


eshu opened a new issue, #5719:
URL: https://github.com/apache/hudi/issues/5719


   I'm trying to build a fat JAR with Hudi bundle and Spark 3.1 (AWS Glue 
version) support with Scala 2.12
   
   All issues does not exist in Hudi 0.10.1 and earlier versions.
   
   1. Dependencies:
   > [error] Modules were resolved with conflicting cross-version suffixes in 
ProjectRef(uri("file:/Users/shu/workspace/daas-glue-core/"), "root"):
   > [error]    org.json4s:json4s-ast _2.12, _2.11
   > [error]    org.json4s:json4s-jackson _2.12, _2.11
   > [error]    org.json4s:json4s-core _2.12, _2.11
   > [error]    org.json4s:json4s-scalap _2.12, _2.11
   Why do I have dependency for Scala 2.12 and 2.11?
   Workaround: I added exclusion rule:
   ```
   ("org.apache.hudi" %% "hudi-spark3" % 
HudiVersion).excludeAll(ExclusionRule("org.json4s", "json4s-jackson_2.11"))
   ```
   There is also dependency to `hudi-spark-common_2.11`, you can check 
https://mvnrepository.com/artifact/org.apache.hudi/hudi-spark3_2.12/0.11.0
   
   **Why there are dependencies to Scala 2.11?**
   
   2. Multiple sources found for hudi (org.apache.hudi.Spark2DefaultSource, 
org.apache.hudi.Spark3DefaultSource)
   When trying to write the empty dataset:
   ```
   [info] - should write the empty dataset *** FAILED ***
   [info]   org.apache.spark.sql.AnalysisException: Multiple sources found for 
hudi (org.apache.hudi.Spark2DefaultSource, 
org.apache.hudi.Spark3DefaultSource), please specify the fully qualified class 
name.
   [info]   at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:720)
   [info]   at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:746)
   [info]   at 
org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:993)
   [info]   at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:311)
   [info]   at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
   ```
   Workaround: I created the package `org.apache.hudi` the class
   ```
   package org.apache.hudi
   
   class Spark2DefaultSource extends DefaultSource  {
     override def shortName(): String = "hudi-spark2"
   }
   ```
   This class overrides Hudi implementation and I can discard it in merge rules.
   
   **Why there are two conflicting definitions?**
   
   3. Spark 3.1 is not supported
   The same test as in the previous example:
   ```
   [info] - should write the empty dataset *** FAILED ***
   [info]   java.lang.ClassNotFoundException: 
org.apache.spark.sql.adapter.Spark3_1Adapter
   [info]   at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
   [info]   at 
sbt.internal.ManagedClassLoader.findClass(ManagedClassLoader.java:102)
   [info]   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
   [info]   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
   [info]   at 
org.apache.hudi.SparkAdapterSupport.sparkAdapter(SparkAdapterSupport.scala:37)
   [info]   at 
org.apache.hudi.SparkAdapterSupport.sparkAdapter$(SparkAdapterSupport.scala:29)
   [info]   at 
org.apache.hudi.HoodieSparkUtils$.sparkAdapter$lzycompute(HoodieSparkUtils.scala:46)
   [info]   at 
org.apache.hudi.HoodieSparkUtils$.sparkAdapter(HoodieSparkUtils.scala:46)
   [info]   at 
org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:150)
   [info]   at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:241)
   ```
   
   I did not find a good workaround for this issue. Class Spark3_1Adapter does 
not exist, I found only Spark3_2Adapter, but there are many references to Spark 
3.2 in code.
   
   **Is support of Spark 3.1 dropped?**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] eshu opened a new issue, #5719: [SUPPORT] Hudi 0.11.0 for Spark 3.1 build issues

Reply via email to