[GitHub] [hudi] deshpandeanoop opened a new issue #3635: [SUPPORT] java.lang.NoSuchMethodError while launching Spark 2.3.1 in stand alone cluster mode.

GitBox Thu, 09 Sep 2021 22:14:09 -0700


deshpandeanoop opened a new issue #3635:
URL: https://github.com/apache/hudi/issues/3635



   Hi Team,
   
   I'm getting `java.lang.NoSuchMethodError` exception when I try launching the 
Spark application in stand alone mode.
   **Exception Trace:**
   ```
   Exception in thread "main" java.lang.NoSuchMethodError: 
org.apache.avro.Schema.createUnion([Lorg/apache/avro/Schema;)Lorg/apache/avro/Schema;
        at 
org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:185)
        at 
org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$$anonfun$5.apply(SchemaConverters.scala:176)
        at 
org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$$anonfun$5.apply(SchemaConverters.scala:174)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at org.apache.spark.sql.types.StructType.foreach(StructType.scala:99)
        at 
org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:174)
        at 
org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:52)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:139)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225)
   ```
   
   Below is my `build.sbt` file:
   ```
    scalaVersion := "2.11.8"
   
        libraryDependencies += ("org.apache.spark" % "spark-core_2.11" % 
"2.3.1"% "provided")
         .exclude("org.apache.avro", "avro")
         .exclude("org.apache.avro", "avro-ipc")
         .exclude("org.apache.avro", "avro-mapred")
   
        libraryDependencies += ("org.apache.spark" % "spark-sql_2.11" % "2.3.1" 
% "provided")
          .exclude("org.apache.avro", "avro")
   
        libraryDependencies += ("org.apache.hudi" % "hudi-spark-bundle_2.11" % 
"0.7.0")
   
   
        libraryDependencies += "org.apache.spark" %% "spark-avro" % "2.4.4"
   
        libraryDependencies += ("com.typesafe.play" %% "play-json" % "2.4.0-M3")
          .exclude("org.slf4j", "slf4j-api")
          .exclude("org.slf4j", "slf4j-log4j12")
          .exclude("org.slf4j", "jcl-over-slf4j")
          .exclude("io.netty", "netty-all")
   ```
   
   Spark submit command:
   ```
   spark-submit --master local --jars <base-dir>/avro-1.8.2.jar --deploy-mode 
client \
       --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
        --class "com.explore.hudi.HudiServiceMainJob" ApacheHudiService.jar
   ```
   
   Versions of the softwares installed on my system,
   
   - Scala: `2.11`
   - Spark: `spark-2.3.1-bin-hadoop2.7`
   
   Just to add, I'm trying to read a `csv` extract and create a `hudi` table 
out of it, below is the my sample code that gets executed upon launching my 
spark application.
   ```
    sparkSession
         .read
         .csv(inputCsvAbsPath)
         .map(row => LibraryCheckoutInfo(
           bibNumber = row.getString(0),
           itemBarcode = row.getString(1),
           itemType = row.getString(2),
           collection = row.getString(3),
           callNumber = row.getString(4)))
         .write
         .format(AppConstants.SPARK_FORMAT_HUDI)
         .options(QuickstartUtils.getQuickstartWriteConfigs)
         .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "bibNumber")
         .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "itemType")
         .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, 
"collection")
         .option(HoodieWriteConfig.TABLE_NAME, hudiTableName)
         .mode(SaveMode.Overwrite)
         .save(hudiTableBasePath)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] deshpandeanoop opened a new issue #3635: [SUPPORT] java.lang.NoSuchMethodError while launching Spark 2.3.1 in stand alone cluster mode.

Reply via email to