Shahida Welcome to Hudi. I am not an expert with DeltaStreamer as I do not use it. In general, I think this points to the issue with build of the fat jar. This looks to me that either you didnt build the fat jar to include all the dependencies or your class path didnt include the jar needed. For some reason I didnt receive the full stack trace attachment. Either you forgot to attach it or mail system blocked it. Can you please check: That your pom has dependency shown as below: <!-- https://mvnrepository.com/artifact/org.apache.parquet/parquet-hadoop --> <dependency> <groupId>org.apache.parquet</groupId> <artifactId>parquet-hadoop</artifactId> <version>1.8.3</version> </dependency>
Can you also run ```jar -tvf shahida.jar | grep -i CompressionCodecName ``` and let us know the output that you see. Once we have the answers to the above, we can see what is missing and address that hopefully. Kabeer. On Oct 15 2019, at 10:39 am, Shahida Khan <[email protected]> wrote: > Hi All, > > Hope you are doing well. > I am currently trying to implement the Hudi Utilities using Delta Streamer. > Below is the command line configuration I am passing > > spark2-submit --master yarn --deploy-mode cluster --class > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer > /tmp/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --props > /user/oozie/dataops/hoodie/config.properties --schemaprovider-class > org.apache.hudi.utilities.schema.SchemaRegistryProvider --source-class > org.apache.hudi.utilities.sources.AvroKafkaSource --source-ordering-field > LastModified_dtmStamp > --target-base-path /tmp/hudi-deltastreamer-op_TEST --target-table > testTableHoodie --op UPSERT --enable-hive-sync --storage-type MERGE_ON_READ > > Also, have attached the config file too. > > Unfortunately, while writing the files in parquet, it throws an exception as > "java.lang.NoClassDefFoundError: > org/apache/parquet/hadoop/metadata/CompressionCodecName" > Full Error Trace has been attached for your reference. > > There are few warnings with respect to configuration but not sure if that's > the problem. > > I have tried giving the classpath as well. I am not sure what i am missing > here. > It would be great if anybody could help me here. > > Hadoop version :- 2.6.0-cdh5.14.2 > Spark version :- 2.3.0.cloudera2 > > > Regards, > Shahida R. Khan > +91 9167538366 > > > > > > > > >
