SchemaConverters

via GitHub Tue, 21 Feb 2023 06:44:37 -0800


GroovyDan opened a new issue, #8007:
URL: https://github.com/apache/hudi/issues/8007


   **Describe the problem you faced**
   
   We recently started receiving the following error when trying to load data 
into our Data Lake via AWS Glue and Apache Hudi: 
   ```
   py4j.protocol.Py4JJavaError: An error occurred while calling 
o915.pyWriteDynamicFrame.
   : java.lang.NoClassDefFoundError: org/apache/spark/sql/avro/SchemaConverters$
   ```
   
   We have not changed any code on our side and the process has been running 
without issue for months.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Use AWS Glue Connector and try to Upsert an Existing Table.
   
   **Expected behavior**
   
   I would expect the data to get upserted into the table.
   
   **Environment Description**
   
   * Hudi version : 0.10.1
   
   * Spark version : 3.1.1
   
   * Hive version : N/A
   
   * Hadoop version : N/A
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   * Glue Version: 3.0
   * Connector: 
https://709825985650.dkr.ecr.us-east-1.amazonaws.com/amazon-web-services/glue/hudi:0.10.1-glue3.0
   
   * Example Hudi Configurations:
   ```
   {
       "className": "org.apache.hudi",
       "hoodie.datasource.hive_sync.use_jdbc": "false",
       "hoodie.datasource.write.precombine.field": "_sdc_sequence",
       "hoodie.datasource.write.recordkey.field": 
"appsflyer_id,event_name,event_time",
       "hoodie.datasource.write.keygenerator.class": 
"org.apache.hudi.keygen.ComplexKeyGenerator",
       "hoodie.datasource.write.hive_style_partitioning": "true",
       "hoodie.datasource.write.row.writer.enable": "true",
       "hoodie.parquet.compression.codec": "snappy",
       "hoodie.table.name": "in_app_events",
       "hoodie.datasource.hive_sync.database": 
"everydollar_android_appsflyer_stitch_parquet",
       "hoodie.datasource.hive_sync.table": "in_app_events",
       "hoodie.datasource.hive_sync.enable": "true",
       "path": 
"s3://rs-prod-stitch-everydollar-android-appsflyer/data/reporting/in_app_events",
       "hoodie.index.type": "BLOOM",
       "hoodie.bloom.index.update.partition.path": "true",
       "hoodie.parquet.small.file.limit": "104857600",
       "hoodie.upsert.shuffle.parallelism": 20,
       "hoodie.datasource.write.operation": "upsert",
       "hoodie.cleaner.policy": "KEEP_LATEST_COMMITS",
       "hoodie.cleaner.commits.retained": 10,
       "hoodie.datasource.write.partitionpath.field": "partn_date",
       "hoodie.datasource.hive_sync.partition_extractor_class": 
"org.apache.hudi.hive.MultiPartKeysValueExtractor",
       "hoodie.datasource.hive_sync.partition_fields": "partn_date"
   }
   ```
   
   **Stacktrace**
   
   
   ```
   py4j.protocol.Py4JJavaError: An error occurred while calling 
o915.pyWriteDynamicFrame.
   : java.lang.NoClassDefFoundError: org/apache/spark/sql/avro/SchemaConverters$
        at 
org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:63)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:226)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
        at 
org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
        at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
        at 
org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
        at 
com.amazonaws.services.glue.marketplace.connector.SparkCustomDataSink.writeDynamicFrame(CustomDataSink.scala:45)
        at 
com.amazonaws.services.glue.DataSink.pyWriteDynamicFrame(DataSink.scala:71)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:750)
   Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.sql.avro.SchemaConverters$
        at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 43 more
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] GroovyDan opened a new issue, #8007: [SUPPORT] java.lang.NoClassDefFoundError: org/apache/spark/sql/avro/SchemaConverters

Reply via email to