Ranga Reddy created HUDI-9057:
---------------------------------
Summary: HoodieStreamer: Encountering ClassNotFoundException:
io.confluent.kafka.schemaregistry.protobuf.ProtobufSchemaProvider after
upgrading to Hudi 1.0.x version
Key: HUDI-9057
URL: https://issues.apache.org/jira/browse/HUDI-9057
Project: Apache Hudi
Issue Type: Bug
Components: deltastreamer
Reporter: Ranga Reddy
Fix For: 1.0.2, 1.0.1, 1.0.0
While using the {{*org.apache.hudi.utilities.streamer.HoodieStreamer*}} class
to extract data from Kafka and write it to a Hudi table, we encountered an
issue.
{code:java}
Exception in thread "main"
org.apache.hudi.utilities.ingestion.HoodieIngestionException: Ingestion service
was shut down with exception.
at
org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:67)
at org.apache.hudi.common.util.Option.ifPresent(Option.java:101)
at
org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:223)
at
org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:638)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1029)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.util.concurrent.ExecutionException:
java.lang.NoClassDefFoundError:
io/confluent/kafka/schemaregistry/protobuf/ProtobufSchemaProvider
at
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
at
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2005)
at
org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:102)
at
org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:65)
... 15 more
Caused by: java.lang.NoClassDefFoundError:
io/confluent/kafka/schemaregistry/protobuf/ProtobufSchemaProvider
at
org.apache.hudi.utilities.schema.SchemaRegistryProvider.lambda$new$1e9d4812$1(SchemaRegistryProvider.java:113)
at
org.apache.hudi.utilities.schema.SchemaRegistryProvider.fetchSchemaFromRegistry(SchemaRegistryProvider.java:173)
at
org.apache.hudi.utilities.sources.debezium.DebeziumSource.fetchNextBatch(DebeziumSource.java:128)
at
org.apache.hudi.utilities.sources.RowSource.readFromCheckpoint(RowSource.java:61)
at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:139)
at
org.apache.hudi.utilities.streamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:184)
at
org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:710)
at
org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:582)
at
org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:554)
at
org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:464)
at
org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.lambda$startService$1(HoodieStreamer.java:821)
at
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException:
io.confluent.kafka.schemaregistry.protobuf.ProtobufSchemaProvider
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
... 15 more {code}
Spark Submit command:
{code:java}
spark-submit
--deploy-mode cluster
--master yarn
--conf spark.rdd.compress=true
--conf spark.shuffle.service.enabled=true
--conf spark.kryoserializer.buffer.max=512m
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer
--conf spark.sql.shuffle.partitions=200
--conf spark.default.parallelism=200
--conf spark.streaming.kafka.allowNonConsecutiveOffsets=true
--jars
s3://some_bucket/libs/hudi-aws-bundle-1.0.1.jar,s3://some_bucket/libs/hudi-spark3.5-bundle_2.12-1.0.1.jar
--class org.apache.hudi.utilities.streamer.HoodieStreamer
s3://some_bucket/libs/hudi-utilities-slim-bundle_2.12-1.0.1.jar
--schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider
--props s3://path/to/file.properties
--source-class org.apache.hudi.utilities.sources.AvroKafkaSource
--source-ordering-field ts
--target-base-path s3://path/to/file
--target-table some_table
--sync-tool-classes org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool
--op UPSERT
--transformer-class org.apache.hudi.utilities.transform.SqlFileBasedTransformer
--enable-sync
--table-type COPY_ON_WRITE {code}
This issue occurred in both Hudi 1.0.0 and Hudi 1.0.1 versions.
For more details, refer the following Hudi issue.
https://github.com/apache/hudi/issues/12838
--
This message was sent by Atlassian Jira
(v8.20.10#820010)