Ranga Reddy created HUDI-9057:
---------------------------------

             Summary: HoodieStreamer: Encountering ClassNotFoundException: 
io.confluent.kafka.schemaregistry.protobuf.ProtobufSchemaProvider after 
upgrading to Hudi 1.0.x version
                 Key: HUDI-9057
                 URL: https://issues.apache.org/jira/browse/HUDI-9057
             Project: Apache Hudi
          Issue Type: Bug
          Components: deltastreamer
            Reporter: Ranga Reddy
             Fix For: 1.0.2, 1.0.1, 1.0.0


While using the {{*org.apache.hudi.utilities.streamer.HoodieStreamer*}} class 
to extract data from Kafka and write it to a Hudi table, we encountered an 
issue.
{code:java}
Exception in thread "main" 
org.apache.hudi.utilities.ingestion.HoodieIngestionException: Ingestion service 
was shut down with exception.
    at 
org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:67)
    at org.apache.hudi.common.util.Option.ifPresent(Option.java:101)
    at 
org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:223)
    at 
org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:638)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
    at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1029)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
    at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.util.concurrent.ExecutionException: 
java.lang.NoClassDefFoundError: 
io/confluent/kafka/schemaregistry/protobuf/ProtobufSchemaProvider
    at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
    at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2005)
    at 
org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:102)
    at 
org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:65)
    ... 15 more
Caused by: java.lang.NoClassDefFoundError: 
io/confluent/kafka/schemaregistry/protobuf/ProtobufSchemaProvider
    at 
org.apache.hudi.utilities.schema.SchemaRegistryProvider.lambda$new$1e9d4812$1(SchemaRegistryProvider.java:113)
    at 
org.apache.hudi.utilities.schema.SchemaRegistryProvider.fetchSchemaFromRegistry(SchemaRegistryProvider.java:173)
    at 
org.apache.hudi.utilities.sources.debezium.DebeziumSource.fetchNextBatch(DebeziumSource.java:128)
    at 
org.apache.hudi.utilities.sources.RowSource.readFromCheckpoint(RowSource.java:61)
    at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:139)
    at 
org.apache.hudi.utilities.streamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:184)
    at 
org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:710)
    at 
org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:582)
    at 
org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:554)
    at 
org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:464)
    at 
org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.lambda$startService$1(HoodieStreamer.java:821)
    at 
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: 
io.confluent.kafka.schemaregistry.protobuf.ProtobufSchemaProvider
    at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
    ... 15 more {code}
Spark Submit command:
{code:java}
spark-submit 
--deploy-mode cluster 
--master yarn 
--conf spark.rdd.compress=true 
--conf spark.shuffle.service.enabled=true 
--conf spark.kryoserializer.buffer.max=512m 
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer 
--conf spark.sql.shuffle.partitions=200 
--conf spark.default.parallelism=200 
--conf spark.streaming.kafka.allowNonConsecutiveOffsets=true 
--jars 
s3://some_bucket/libs/hudi-aws-bundle-1.0.1.jar,s3://some_bucket/libs/hudi-spark3.5-bundle_2.12-1.0.1.jar
 
--class org.apache.hudi.utilities.streamer.HoodieStreamer 
s3://some_bucket/libs/hudi-utilities-slim-bundle_2.12-1.0.1.jar 
--schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider 
--props s3://path/to/file.properties 
--source-class org.apache.hudi.utilities.sources.AvroKafkaSource 
--source-ordering-field ts 
--target-base-path s3://path/to/file
--target-table some_table 
--sync-tool-classes org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool 
--op UPSERT 
--transformer-class org.apache.hudi.utilities.transform.SqlFileBasedTransformer 
--enable-sync 
--table-type COPY_ON_WRITE {code}
This issue occurred in both Hudi 1.0.0 and Hudi 1.0.1 versions.

For more details, refer the following Hudi issue.

https://github.com/apache/hudi/issues/12838



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to