GroovyDan opened a new issue, #8007: URL: https://github.com/apache/hudi/issues/8007
**Describe the problem you faced** We recently started receiving the following error when trying to load data into our Data Lake via AWS Glue and Apache Hudi: ``` py4j.protocol.Py4JJavaError: An error occurred while calling o915.pyWriteDynamicFrame. : java.lang.NoClassDefFoundError: org/apache/spark/sql/avro/SchemaConverters$ ``` We have not changed any code on our side and the process has been running without issue for months. **To Reproduce** Steps to reproduce the behavior: 1. Use AWS Glue Connector and try to Upsert an Existing Table. **Expected behavior** I would expect the data to get upserted into the table. **Environment Description** * Hudi version : 0.10.1 * Spark version : 3.1.1 * Hive version : N/A * Hadoop version : N/A * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** * Glue Version: 3.0 * Connector: https://709825985650.dkr.ecr.us-east-1.amazonaws.com/amazon-web-services/glue/hudi:0.10.1-glue3.0 * Example Hudi Configurations: ``` { "className": "org.apache.hudi", "hoodie.datasource.hive_sync.use_jdbc": "false", "hoodie.datasource.write.precombine.field": "_sdc_sequence", "hoodie.datasource.write.recordkey.field": "appsflyer_id,event_name,event_time", "hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.ComplexKeyGenerator", "hoodie.datasource.write.hive_style_partitioning": "true", "hoodie.datasource.write.row.writer.enable": "true", "hoodie.parquet.compression.codec": "snappy", "hoodie.table.name": "in_app_events", "hoodie.datasource.hive_sync.database": "everydollar_android_appsflyer_stitch_parquet", "hoodie.datasource.hive_sync.table": "in_app_events", "hoodie.datasource.hive_sync.enable": "true", "path": "s3://rs-prod-stitch-everydollar-android-appsflyer/data/reporting/in_app_events", "hoodie.index.type": "BLOOM", "hoodie.bloom.index.update.partition.path": "true", "hoodie.parquet.small.file.limit": "104857600", "hoodie.upsert.shuffle.parallelism": 20, "hoodie.datasource.write.operation": "upsert", "hoodie.cleaner.policy": "KEEP_LATEST_COMMITS", "hoodie.cleaner.commits.retained": 10, "hoodie.datasource.write.partitionpath.field": "partn_date", "hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor", "hoodie.datasource.hive_sync.partition_fields": "partn_date" } ``` **Stacktrace** ``` py4j.protocol.Py4JJavaError: An error occurred while calling o915.pyWriteDynamicFrame. : java.lang.NoClassDefFoundError: org/apache/spark/sql/avro/SchemaConverters$ at org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:63) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:226) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133) at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301) at com.amazonaws.services.glue.marketplace.connector.SparkCustomDataSink.writeDynamicFrame(CustomDataSink.scala:45) at com.amazonaws.services.glue.DataSink.pyWriteDynamicFrame(DataSink.scala:71) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.avro.SchemaConverters$ at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 43 more ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
