zyperd opened a new issue, #10561:
URL: https://github.com/apache/hudi/issues/10561

   
   When hudi is reading the debezium ingested topics the following error 
message is displayed, kindly help to identify the issue
   
   ```
   Caused by: java.lang.NoSuchMethodException: 
org.apache.hudi.utilities.sources.debezium.MysqlDebeziumSource.<init>(org.apache.hudi.common.config.TypedProperties,org.apache.spark.api.java.JavaSparkContext,org.apache.spark.sql.SparkSession,org.apache.hudi.utilities.schema.SchemaProvider)```
   
   In the source 
   public MysqlDebeziumSource(TypedProperties props, JavaSparkContext 
sparkContext,
                                SparkSession sparkSession,
                                SchemaProvider schemaProvider,
                                HoodieIngestionMetrics metrics)
   Is the spark-submit command missing any hudi config?
   hudi-aws-bundle.jar -> hudi-utilities-bundle_2.12-0.14.0-amzn-1.jar
   ```
   
   
   ```
   spark-submit \
         --master yarn \
          --deploy-mode cluster \
         --driver-memory 2g --executor-memory 1g --num-executors 1 
--executor-cores 1 \
         --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
         --conf spark.sql.catalogImplementation=hive \
         --conf spark.driver.maxResultSize=1g \
         --conf spark.speculation=true \
         --conf spark.speculation.multiplier=1.0 \
         --conf spark.speculation.quantile=0.5 \
         --conf spark.ui.port=6680 \
         --conf spark.eventLog.dir=s3://spark_events/ \
         --conf spark.eventLog.enabled=true \
         --conf 
spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog 
\
         --conf spark.scheduler.mode=FAIR \
         --jars 
/usr/lib/hudi/hudi-aws-bundle.jar,/home/hadoop/kafka-avro-serializer-3.1.1.jar \
         --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
/usr/lib/hudi/hudi-utilities-bundle.jar \
         --target-base-path s3://mysql_cdc/table_cdc/ \
         --source-class 
org.apache.hudi.utilities.sources.debezium.MysqlDebeziumSource \
         --payload-class 
org.apache.hudi.common.model.debezium.MySqlDebeziumAvroPayload \
         --schemaprovider-class 
org.apache.hudi.utilities.schema.FilebasedSchemaProvider  \
         --source-ordering-field id \
         --target-table table_cdc \
         --table-type COPY_ON_WRITE \
         --op UPSERT \
         --enable-hive-sync \
         --sync-tool-classes org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool \
         --hoodie-conf 
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor
 \
         --hoodie-conf auto.offset.reset=earliest \
         --hoodie-conf bootstrap.servers=127.0.0.1:9002 \
         --hoodie-conf hoodie.deltastreamer.source.kafka.topic="table_cdc" \
         --hoodie-conf 
hoodie.deltastreamer.source.kafka.value.deserializer.class=org.apache.hudi.utilities.deser.KafkaAvroSchemaDeserializer
 \
         --hoodie-conf hoodie.datasource.hive_sync.enable=true \
         --hoodie-conf hoodie.datasource.hive_sync.database=default \
         --hoodie-conf hoodie.datasource.hive_sync.table=table_cdc \
         --hoodie-conf hoodie.datasource.hive_sync.use_jdbc=false \
         --hoodie-conf hoodie.datasource.write.recordkey.field=id \
         --hoodie-conf hoodie.datasource.write.partitionpath.field=value_type \
         --hoodie-conf 
hoodie.compaction.payload.class=org.apache.hudi.common.model.DebeziumAvroPayload
  \
         --hoodie-conf hoodie.table.name=table_cdc \
         --hoodie-conf 
hoodie.streamer.schemaprovider.source.schema.file=file:///source.avsc \
         --hoodie-conf 
hoodie.streamer.schemaprovider.target.schema.file=file:///target.avsc \
         --hoodie-conf hoodie.datasource.hive_sync.partition_fields=value_type \
         --hoodie-conf hoodie.datasource.write.hive_style_partitioning=false \
         --hoodie-conf hoodie.deltastreamer.source.dfs.root=s3://cdc-events/ \
         --hoodie-conf hoodie.datasource.hive_sync.mode=hms
   ```
   
   #source.avsc
   ```
   {
     "type": "record",
     "name": "ChangeEvent",
     "fields": [
       {
         "name": "before",
         "type": ["null", "string"]
       },
       {
         "name": "after",
         "type": {
           "type": "record",
           "name": "After",
           "fields": [
             { "name": "id", "type": ["int"] },
             { "name": "values", "type": "string" },
             { "name": "value_type", "type": "string" },
           ]
         }
       },
       {
         "name": "source",
         "type": {
           "type": "record",
           "name": "Source",
           "fields": [
             { "name": "version", "type": ["null", "string"] },
             { "name": "connector", "type": ["null", "string"] },
             { "name": "name", "type": ["null", "string"] },
             { "name": "ts_ms", "type": ["null", "long"] },
             { "name": "snapshot", "type": ["null", "boolean"] },
             { "name": "db", "type": ["null", "string"] },
             { "name": "sequence", "type": ["null", "string"] },
             { "name": "table", "type": ["null", "string"] },
             { "name": "server_id", "type": ["null", "long"] },
             { "name": "gtid", "type": ["null", "string"] },
             { "name": "file", "type": ["null", "string"] },
             { "name": "pos", "type": ["null", "int"] },
             { "name": "row", "type": ["null", "int"] },
             { "name": "thread", "type": ["null", "int"] },
             { "name": "query", "type": ["null", "string"] }
           ]
         }
       },
        {
         "name": "op",
         "type": ["null", "string"]
       },
       {
         "name": "ts_ms",
         "type": ["null", "long"]
       },
       {
         "name": "transaction",
         "type": ["null", "string"]
       }
     ]
   }
   ```
   
   #target.avsc
   ```
   {
     "type": "record",
     "name": "cdc",
     "fields": [
       {
         "name": "id",
         "type": ["int"]
       },
       {
         "name": "value_type",
         "type": "string"
       },
        {
         "name": "values",
         "type": "string"
       }]
   }
   ```
   
   
   
   
   **Environment Description**
   
   * Hudi version :Hudi 0.14.0-amzn-1
   
   * Spark version : Spark 3.5.0 (emr-7.0.0)
   
   * Hive version :EMR: Hive 3.1.3,
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   ```
   while running with hudi-utilities-slim-bundle_2.12-0.14.0-amzn-1.jar
   
   following error stack is shown
   
   org.apache.spark.SparkException: Exception thrown in awaitResult: 
        at 
org.apache.spark.util.SparkThreadUtils$.awaitResult(SparkThreadUtils.scala:56) 
~[spark-common-utils_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
        at 
org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:310) 
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:509)
 ~[spark-yarn_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:268) 
~[spark-yarn_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:937)
 ~[spark-yarn_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:936)
 ~[spark-yarn_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
        at 
java.security.AccessController.doPrivileged(AccessController.java:712) [?:?]
        at javax.security.auth.Subject.doAs(Subject.java:439) [?:?]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
 [hadoop-client-api-3.3.6-amzn-2.jar:?]
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:936)
 [spark-yarn_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) 
[spark-yarn_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
   Caused by: java.util.concurrent.ExecutionException: Boxed Error
        at scala.concurrent.impl.Promise$.resolver(Promise.scala:87) 
~[scala-library-2.12.17.jar:?]
        at 
scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$$resolveTry(Promise.scala:79)
 ~[scala-library-2.12.17.jar:?]
        at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284) 
~[scala-library-2.12.17.jar:?]
        at scala.concurrent.Promise.tryFailure(Promise.scala:112) 
~[scala-library-2.12.17.jar:?]
        at scala.concurrent.Promise.tryFailure$(Promise.scala:112) 
~[scala-library-2.12.17.jar:?]
        at 
scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:187) 
~[scala-library-2.12.17.jar:?]
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:760)
 ~[spark-yarn_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
   Caused by: java.lang.NoClassDefFoundError: 
org/apache/hudi/client/common/HoodieSparkEngineContext
        at 
org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:575) 
~[?:?]
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:?]
        at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
 ~[?:?]
        at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:?]
        at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:741)
 ~[spark-yarn_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
   Caused by: java.lang.ClassNotFoundException: 
org.apache.hudi.client.common.HoodieSparkEngineContext
        at java.net.URLClassLoader.findClass(URLClassLoader.java:445) ~[?:?]
        at java.lang.ClassLoader.loadClass(ClassLoader.java:592) ~[?:?]
        at java.lang.ClassLoader.loadClass(ClassLoader.java:525) ~[?:?]
        at 
org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:575) 
~[?:?]
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:?]
        at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
 ~[?:?]
        at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:?]
        at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:741)
 ~[spark-yarn_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
   24/01/25 05:48:40 INFO ApplicationMaster: Deleting staging directory 
hdfs://ip-127.0.0.1.region.compute.internal:8020/user/hadoop/.sparkStaging/application_1705949945402_0077
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to