remeajayi2022 commented on issue #12301:
URL: https://github.com/apache/hudi/issues/12301#issuecomment-2496014392
Hi @the-other-tim-brown,
Thank you so much for taking the time to help with this earlier! I
appreciate your insights. Following your suggestion, I’ve downgraded the
kafka-protobuf-provider and kafka-json-schema-provider jars to version 5.5.0.
However, I’m encountering another issue related to Protobuf compatibility in
this configuration:
```
gcloud dataproc jobs submit spark \
--class org.apache.hudi.utilities.streamer.HoodieStreamer \
--jars
<storage-path>/jars/hudi-utilities-bundle_2.12-1.0.0-SNAPSHOT.jar,<storage-path>/jars/kafka-protobuf-provider-5.5.0.jar,<storage-path>/jars/kafka-json-schema-provider-5.5.0.jar
\
-- \
--source-class org.apache.hudi.utilities.sources.ProtoKafkaSource \
--schemaprovider-class
org.apache.hudi.utilities.schema.SchemaRegistryProvider \
--hoodie-conf schema.registry.url=<url> \
--hoodie-conf
hoodie.streamer.schemaprovider.registry.url=<full-url>/subjects/datagen-proto-value/versions/latest
\
--hoodie-conf
hoodie.streamer.schemaprovider.registry.schemaconverter=org.apache.hudi.utilities.schema.converter.ProtoSchemaToAvroSchemaConverter
\
--hoodie-conf
hoodie.streamer.source.kafka.proto.value.deserializer.class=io.confluent.kafka.serializers.protobuf.KafkaProtobufDeserializer
```
When I run the job, I see the following errors:
1. With the jars included:
`Caused by: java.util.concurrent.ExecutionException:
java.lang.NoClassDefFoundError:
com/google/protobuf/Descriptors$DescriptorValidationException`
2. Without the jars:
`Caused by: java.lang.ClassNotFoundException:
io.confluent.kafka.schemaregistry.protobuf.ProtobufSchemaProvider`
It seems like there may still be a missing dependency or configuration issue
that I haven’t accounted for. Do you have any suggestions on how I could
resolve these errors? Am I possibly overlooking a required library or some
specific classpath setup?
Thanks again for your time and support—it’s greatly appreciated!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]