[
https://issues.apache.org/jira/browse/HUDI-9057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-9057:
--------------------------------------
Status: Patch Available (was: In Progress)
> HoodieStreamer: Encountering ClassNotFoundException:
> io.confluent.kafka.schemaregistry.protobuf.ProtobufSchemaProvider after
> upgrading to Hudi 1.0.x version
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HUDI-9057
> URL: https://issues.apache.org/jira/browse/HUDI-9057
> Project: Apache Hudi
> Issue Type: Sub-task
> Components: deltastreamer
> Reporter: Ranga Reddy
> Priority: Blocker
> Labels: HoodieStreamer, pull-request-available
> Fix For: 1.0.0, 1.0.1, 1.0.2
>
>
> While using the {{*org.apache.hudi.utilities.streamer.HoodieStreamer*}} class
> to extract data from Kafka and write it to a Hudi table, we encountered an
> issue.
> {code:java}
> Exception in thread "main"
> org.apache.hudi.utilities.ingestion.HoodieIngestionException: Ingestion
> service was shut down with exception.
> at
> org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:67)
> at org.apache.hudi.common.util.Option.ifPresent(Option.java:101)
> at
> org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:223)
> at
> org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:638)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1029)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.util.concurrent.ExecutionException:
> java.lang.NoClassDefFoundError:
> io/confluent/kafka/schemaregistry/protobuf/ProtobufSchemaProvider
> at
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> at
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2005)
> at
> org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:102)
> at
> org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:65)
> ... 15 more
> Caused by: java.lang.NoClassDefFoundError:
> io/confluent/kafka/schemaregistry/protobuf/ProtobufSchemaProvider
> at
> org.apache.hudi.utilities.schema.SchemaRegistryProvider.lambda$new$1e9d4812$1(SchemaRegistryProvider.java:113)
> at
> org.apache.hudi.utilities.schema.SchemaRegistryProvider.fetchSchemaFromRegistry(SchemaRegistryProvider.java:173)
> at
> org.apache.hudi.utilities.sources.debezium.DebeziumSource.fetchNextBatch(DebeziumSource.java:128)
> at
> org.apache.hudi.utilities.sources.RowSource.readFromCheckpoint(RowSource.java:61)
> at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:139)
> at
> org.apache.hudi.utilities.streamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:184)
> at
> org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:710)
> at
> org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:582)
> at
> org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:554)
> at
> org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:464)
> at
> org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.lambda$startService$1(HoodieStreamer.java:821)
> at
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.ClassNotFoundException:
> io.confluent.kafka.schemaregistry.protobuf.ProtobufSchemaProvider
> at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
> ... 15 more {code}
> Spark Submit command:
> {code:java}
> spark-submit
> --deploy-mode cluster
> --master yarn
> --conf spark.rdd.compress=true
> --conf spark.shuffle.service.enabled=true
> --conf spark.kryoserializer.buffer.max=512m
> --conf spark.serializer=org.apache.spark.serializer.KryoSerializer
> --conf spark.sql.shuffle.partitions=200
> --conf spark.default.parallelism=200
> --conf spark.streaming.kafka.allowNonConsecutiveOffsets=true
> --jars
> s3://some_bucket/libs/hudi-aws-bundle-1.0.1.jar,s3://some_bucket/libs/hudi-spark3.5-bundle_2.12-1.0.1.jar
>
> --class org.apache.hudi.utilities.streamer.HoodieStreamer
> s3://some_bucket/libs/hudi-utilities-slim-bundle_2.12-1.0.1.jar
> --schemaprovider-class
> org.apache.hudi.utilities.schema.SchemaRegistryProvider
> --props s3://path/to/file.properties
> --source-class org.apache.hudi.utilities.sources.AvroKafkaSource
> --source-ordering-field ts
> --target-base-path s3://path/to/file
> --target-table some_table
> --sync-tool-classes org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool
> --op UPSERT
> --transformer-class
> org.apache.hudi.utilities.transform.SqlFileBasedTransformer
> --enable-sync
> --table-type COPY_ON_WRITE {code}
> This issue occurred in both Hudi 1.0.0 and Hudi 1.0.1 versions.
> For more details, refer the following Hudi issue.
> https://github.com/apache/hudi/issues/12838
--
This message was sent by Atlassian Jira
(v8.20.10#820010)