Hi All, Hope you are doing well. I am currently trying to implement the Hudi Utilities using Delta Streamer. Below is the command line configuration I am passing
spark2-submit --master yarn --deploy-mode cluster --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer /tmp/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --props /user/oozie/dataops/hoodie/config.properties --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider --source-class org.apache.hudi.utilities.sources.AvroKafkaSource --source-ordering-field LastModified_dtmStamp --target-base-path /tmp/hudi-deltastreamer-op_TEST --target-table testTableHoodie --op UPSERT --enable-hive-sync --storage-type MERGE_ON_READ Also, have attached the config file too. Unfortunately, while writing the files in parquet, it throws an exception as "java.lang.NoClassDefFoundError: org/apache/parquet/hadoop/metadata/CompressionCodecName" Full Error Trace has been attached for your reference. There are few warnings with respect to configuration but not sure if that's the problem. I have tried giving the classpath as well. I am not sure what i am missing here. It would be great if anybody could help me here. Hadoop version :- 2.6.0-cdh5.14.2 Spark version :- 2.3.0.cloudera2 *Regards,* *Shahida R. Khan* *+91 9167538366*
