zyperd opened a new issue, #10561:
URL: https://github.com/apache/hudi/issues/10561
When hudi is reading the debezium ingested topics the following error
message is displayed, kindly help to identify the issue
```
Caused by: java.lang.NoSuchMethodException:
org.apache.hudi.utilities.sources.debezium.MysqlDebeziumSource.<init>(org.apache.hudi.common.config.TypedProperties,org.apache.spark.api.java.JavaSparkContext,org.apache.spark.sql.SparkSession,org.apache.hudi.utilities.schema.SchemaProvider)```
In the source
public MysqlDebeziumSource(TypedProperties props, JavaSparkContext
sparkContext,
SparkSession sparkSession,
SchemaProvider schemaProvider,
HoodieIngestionMetrics metrics)
Is the spark-submit command missing any hudi config?
hudi-aws-bundle.jar -> hudi-utilities-bundle_2.12-0.14.0-amzn-1.jar
```
```
spark-submit \
--master yarn \
--deploy-mode cluster \
--driver-memory 2g --executor-memory 1g --num-executors 1
--executor-cores 1 \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.sql.catalogImplementation=hive \
--conf spark.driver.maxResultSize=1g \
--conf spark.speculation=true \
--conf spark.speculation.multiplier=1.0 \
--conf spark.speculation.quantile=0.5 \
--conf spark.ui.port=6680 \
--conf spark.eventLog.dir=s3://spark_events/ \
--conf spark.eventLog.enabled=true \
--conf
spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog
\
--conf spark.scheduler.mode=FAIR \
--jars
/usr/lib/hudi/hudi-aws-bundle.jar,/home/hadoop/kafka-avro-serializer-3.1.1.jar \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
/usr/lib/hudi/hudi-utilities-bundle.jar \
--target-base-path s3://mysql_cdc/table_cdc/ \
--source-class
org.apache.hudi.utilities.sources.debezium.MysqlDebeziumSource \
--payload-class
org.apache.hudi.common.model.debezium.MySqlDebeziumAvroPayload \
--schemaprovider-class
org.apache.hudi.utilities.schema.FilebasedSchemaProvider \
--source-ordering-field id \
--target-table table_cdc \
--table-type COPY_ON_WRITE \
--op UPSERT \
--enable-hive-sync \
--sync-tool-classes org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool \
--hoodie-conf
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor
\
--hoodie-conf auto.offset.reset=earliest \
--hoodie-conf bootstrap.servers=127.0.0.1:9002 \
--hoodie-conf hoodie.deltastreamer.source.kafka.topic="table_cdc" \
--hoodie-conf
hoodie.deltastreamer.source.kafka.value.deserializer.class=org.apache.hudi.utilities.deser.KafkaAvroSchemaDeserializer
\
--hoodie-conf hoodie.datasource.hive_sync.enable=true \
--hoodie-conf hoodie.datasource.hive_sync.database=default \
--hoodie-conf hoodie.datasource.hive_sync.table=table_cdc \
--hoodie-conf hoodie.datasource.hive_sync.use_jdbc=false \
--hoodie-conf hoodie.datasource.write.recordkey.field=id \
--hoodie-conf hoodie.datasource.write.partitionpath.field=value_type \
--hoodie-conf
hoodie.compaction.payload.class=org.apache.hudi.common.model.DebeziumAvroPayload
\
--hoodie-conf hoodie.table.name=table_cdc \
--hoodie-conf
hoodie.streamer.schemaprovider.source.schema.file=file:///source.avsc \
--hoodie-conf
hoodie.streamer.schemaprovider.target.schema.file=file:///target.avsc \
--hoodie-conf hoodie.datasource.hive_sync.partition_fields=value_type \
--hoodie-conf hoodie.datasource.write.hive_style_partitioning=false \
--hoodie-conf hoodie.deltastreamer.source.dfs.root=s3://cdc-events/ \
--hoodie-conf hoodie.datasource.hive_sync.mode=hms
```
#source.avsc
```
{
"type": "record",
"name": "ChangeEvent",
"fields": [
{
"name": "before",
"type": ["null", "string"]
},
{
"name": "after",
"type": {
"type": "record",
"name": "After",
"fields": [
{ "name": "id", "type": ["int"] },
{ "name": "values", "type": "string" },
{ "name": "value_type", "type": "string" },
]
}
},
{
"name": "source",
"type": {
"type": "record",
"name": "Source",
"fields": [
{ "name": "version", "type": ["null", "string"] },
{ "name": "connector", "type": ["null", "string"] },
{ "name": "name", "type": ["null", "string"] },
{ "name": "ts_ms", "type": ["null", "long"] },
{ "name": "snapshot", "type": ["null", "boolean"] },
{ "name": "db", "type": ["null", "string"] },
{ "name": "sequence", "type": ["null", "string"] },
{ "name": "table", "type": ["null", "string"] },
{ "name": "server_id", "type": ["null", "long"] },
{ "name": "gtid", "type": ["null", "string"] },
{ "name": "file", "type": ["null", "string"] },
{ "name": "pos", "type": ["null", "int"] },
{ "name": "row", "type": ["null", "int"] },
{ "name": "thread", "type": ["null", "int"] },
{ "name": "query", "type": ["null", "string"] }
]
}
},
{
"name": "op",
"type": ["null", "string"]
},
{
"name": "ts_ms",
"type": ["null", "long"]
},
{
"name": "transaction",
"type": ["null", "string"]
}
]
}
```
#target.avsc
```
{
"type": "record",
"name": "cdc",
"fields": [
{
"name": "id",
"type": ["int"]
},
{
"name": "value_type",
"type": "string"
},
{
"name": "values",
"type": "string"
}]
}
```
**Environment Description**
* Hudi version :Hudi 0.14.0-amzn-1
* Spark version : Spark 3.5.0 (emr-7.0.0)
* Hive version :EMR: Hive 3.1.3,
* Hadoop version :
* Storage (HDFS/S3/GCS..) : S3
* Running on Docker? (yes/no) : no
**Additional context**
Add any other context about the problem here.
**Stacktrace**
```
while running with hudi-utilities-slim-bundle_2.12-0.14.0-amzn-1.jar
following error stack is shown
org.apache.spark.SparkException: Exception thrown in awaitResult:
at
org.apache.spark.util.SparkThreadUtils$.awaitResult(SparkThreadUtils.scala:56)
~[spark-common-utils_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:310)
~[spark-core_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:509)
~[spark-yarn_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:268)
~[spark-yarn_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:937)
~[spark-yarn_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:936)
~[spark-yarn_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
java.security.AccessController.doPrivileged(AccessController.java:712) [?:?]
at javax.security.auth.Subject.doAs(Subject.java:439) [?:?]
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
[hadoop-client-api-3.3.6-amzn-2.jar:?]
at
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:936)
[spark-yarn_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
at
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
[spark-yarn_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
Caused by: java.util.concurrent.ExecutionException: Boxed Error
at scala.concurrent.impl.Promise$.resolver(Promise.scala:87)
~[scala-library-2.12.17.jar:?]
at
scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$$resolveTry(Promise.scala:79)
~[scala-library-2.12.17.jar:?]
at
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
~[scala-library-2.12.17.jar:?]
at scala.concurrent.Promise.tryFailure(Promise.scala:112)
~[scala-library-2.12.17.jar:?]
at scala.concurrent.Promise.tryFailure$(Promise.scala:112)
~[scala-library-2.12.17.jar:?]
at
scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:187)
~[scala-library-2.12.17.jar:?]
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:760)
~[spark-yarn_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
Caused by: java.lang.NoClassDefFoundError:
org/apache/hudi/client/common/HoodieSparkEngineContext
at
org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:575)
~[?:?]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
~[?:?]
at
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
~[?:?]
at
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:?]
at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:741)
~[spark-yarn_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
Caused by: java.lang.ClassNotFoundException:
org.apache.hudi.client.common.HoodieSparkEngineContext
at java.net.URLClassLoader.findClass(URLClassLoader.java:445) ~[?:?]
at java.lang.ClassLoader.loadClass(ClassLoader.java:592) ~[?:?]
at java.lang.ClassLoader.loadClass(ClassLoader.java:525) ~[?:?]
at
org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:575)
~[?:?]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
~[?:?]
at
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
~[?:?]
at
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:?]
at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:741)
~[spark-yarn_2.12-3.5.0-amzn-0.jar:3.5.0-amzn-0]
24/01/25 05:48:40 INFO ApplicationMaster: Deleting staging directory
hdfs://ip-127.0.0.1.region.compute.internal:8020/user/hadoop/.sparkStaging/application_1705949945402_0077
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]