First off, thanks Kabeer for stepping up and answering bunch of these questions! Really helps us scale community support! Keep em coming :)
Hi Shaida, The utilities-bundle 0.5.1-SNAPSHOT (master) does not bundle parquet jars (except parquet-avro), but instead use it from the spark installation. The only major difference I notice is that you are on CDH. CDH spark and Hive tend to be different from Apache or other distros. Can try jar tf on the spark installation's jars parquet-* ones and see if you find the class? Also I am not sure I follow how including dependencies in your own project will affect anything, if you are simply passing the utilities-bundle to spark-submit? Kindly clarify On Tue, Oct 15, 2019 at 6:50 AM Shahida Khan <[email protected]> wrote: > Hi Kabeer, > > I have added both the dependency and tried too. > Just a version change, I have used *parquet-hadoop 1.8.1 *since > *parquet-avro > *is* 1.8.1.* > *Looks like this * > > * <parquet.version>1.8.1</parquet.version>* > > > > > > > > > > > > > > *<dependency><groupId>org.apache.parquet</groupId><artifactId>parquet-avro</artifactId><version>${parquet.version}</version><!-- > <scope>provided</scope> > > --></dependency><dependency><groupId>org.apache.parquet</groupId><artifactId>parquet-hadoop</artifactId><version> > ${parquet.version} </version></dependency> * > > > Regards, > *Shahida R. Khan* > > > On Tue, Oct 15, 2019 at 7:12 PM Kabeer Ahmed <[email protected]> wrote: > > > Thank you Shahida. Can you please confirm that you have included both the > > below dependencies and tried the build? > > > > If your build is missing parquet-hadoop, then the required class may not > > be found. If you have already included the below dependencies and still > it > > doesnt work, I can upload a jar for you to try. > > <dependency> > > <groupId>org.apache.parquet</groupId> > > <artifactId>parquet-avro</artifactId> > > <version>${parquet.version}</version> > > <scope>provided</scope> > > </dependency> > > > > <dependency> > > <groupId>org.apache.parquet</groupId> > > <artifactId>parquet-hadoop</artifactId> > > <version>1.8.3</version> > > </dependency> > > On Oct 15 2019, at 2:28 pm, Shahida Khan <[email protected] > .INVALID> > > wrote: > > > Hi Kabeer, > > > > > > Thank you for quick response! > > > Also, our project already include the below dependency, I believe this > > should include "org.apache.parquet.parquet-hadoop" > > > > > > > > > <dependency> > > > <groupId>org.apache.parquet</groupId> > > > <artifactId>parquet-avro</artifactId> > > > <version>${parquet.version}</version> > > > </dependency> > > > > > > > > > I have even checked the ```jar -tvf shahida.jar | grep -i > > CompressionCodecName``` class is not available in the jar even after > > including in build. > > > > > > Strange is, I have even provided the parquet-avro jar via spark-submit, > > and it behave differently for 1.7 and 1.8 > > > Seems like there is some configuration missing with respect to > > HoodieStorageConfig.PARQUET_COMPRESSION_CODEC. > > > > > > > > > > > > > > > Regards, > > > Shahida R. Khan > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Oct 15, 2019 at 4:24 PM Kabeer Ahmed <[email protected] > > (mailto:[email protected])> wrote: > > > > Shahida > > > > > > > > Welcome to Hudi. I am not an expert with DeltaStreamer as I do not > use > > it. In general, I think this points to the issue with build of the fat > jar. > > This looks to me that either you didnt build the fat jar to include all > the > > dependencies or your class path didnt include the jar needed. > > > > For some reason I didnt receive the full stack trace attachment. > > Either you forgot to attach it or mail system blocked it. > > > > Can you please check: > > > > That your pom has dependency shown as below: > > > > <!-- > > https://mvnrepository.com/artifact/org.apache.parquet/parquet-hadoop --> > > > > <dependency> > > > > <groupId>org.apache.parquet</groupId> > > > > <artifactId>parquet-hadoop</artifactId> > > > > <version>1.8.3</version> > > > > </dependency> > > > > > > > > Can you also run ```jar -tvf shahida.jar | grep -i > > CompressionCodecName ``` and let us know the output that you see. > > > > Once we have the answers to the above, we can see what is missing and > > address that hopefully. > > > > Kabeer. > > > > On Oct 15 2019, at 10:39 am, Shahida Khan <[email protected] > > (mailto:[email protected])> wrote: > > > > > Hi All, > > > > > > > > > > Hope you are doing well. > > > > > I am currently trying to implement the Hudi Utilities using Delta > > Streamer. Below is the command line configuration I am passing > > > > > > > > > > spark2-submit --master yarn --deploy-mode cluster --class > > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer > > /tmp/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --props > > /user/oozie/dataops/hoodie/config.properties --schemaprovider-class > > org.apache.hudi.utilities.schema.SchemaRegistryProvider --source-class > > org.apache.hudi.utilities.sources.AvroKafkaSource --source-ordering-field > > LastModified_dtmStamp > > > > > --target-base-path /tmp/hudi-deltastreamer-op_TEST --target-table > > testTableHoodie --op UPSERT --enable-hive-sync --storage-type > MERGE_ON_READ > > > > > > > > > > Also, have attached the config file too. > > > > > > > > > > Unfortunately, while writing the files in parquet, it throws an > > exception as "java.lang.NoClassDefFoundError: > > org/apache/parquet/hadoop/metadata/CompressionCodecName" > > > > > Full Error Trace has been attached for your reference. > > > > > > > > > > There are few warnings with respect to configuration but not sure > if > > that's the problem. > > > > > > > > > > I have tried giving the classpath as well. I am not sure what i am > > missing here. > > > > > It would be great if anybody could help me here. > > > > > > > > > > Hadoop version :- 2.6.0-cdh5.14.2 > > > > > Spark version :- 2.3.0.cloudera2 > > > > > > > > > > > > > > > Regards, > > > > > Shahida R. Khan > > > > > +91 9167538366 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The information contained in this transmission may contain privileged > > and confidential information of Big Tree Entertainment Pvt Ltd, including > > information protected by privacy laws. It is intended only for the use of > > Big Tree Entertainment Pvt Ltd. If you are not the intended recipient, > you > > are hereby notified that any review, dissemination, distribution, or > > duplication of this communication is strictly prohibited. If you are not > > the intended recipient, please contact the sender by reply email and > > destroy all copies of the original message. Although Big Tree > Entertainment > > Pvt Ltd. has taken reasonable precautions to ensure no viruses are > present > > in this email, Big Tree Entertainment Pvt Ltd. cannot accept > responsibility > > for any loss or damage arising from the use of this email or attachments. > > Computer viruses can be transmitted via email. Recipient should check the > > email and any attachments for the presence of viruses before using them. > > Any views or opinions are solely those of th > > e author and do not necessarily represent those of Big Tree Entertainment > > Pvt Ltd. > > > > -- > > > > > > > > > > > The information contained in this transmission may contain > privileged and confidential information of Big Tree Entertainment Pvt Ltd, > including information protected by privacy laws. It is intended only for > the use of Big Tree Entertainment Pvt Ltd. If you are not the intended > recipient, you are hereby notified that any review, dissemination, > distribution, or duplication of this communication is strictly prohibited. > If you are not the intended recipient, please contact the sender by reply > email and destroy all copies of the original message. Although Big Tree > Entertainment Pvt Ltd. has taken reasonable precautions to ensure no > viruses are present in this email, Big Tree Entertainment Pvt Ltd. cannot > accept responsibility for any loss or damage arising from the use of this > email or attachments. Computer viruses can be transmitted via email. > Recipient should check the email and any attachments for the presence of > viruses before using them. Any views or opinions are solely those of the > author and do not necessarily represent those of Big Tree Entertainment > Pvt > Ltd. >
