Shahida, How have you been faring with the setup? Did any of the options work for you? If you still are out of luck, I can spin a CDH cluster in docker and model your test case. From your emails, am I right in assuming that all you want to do is use DeltaStreamer to ingest from Kafka into Hudi and create a hive table on the top of that? Thanks Kabeer.
On Oct 16 2019, at 9:19 am, Shahida Khan <[email protected]> wrote: > Thank Kabeer for helping me with doc, I am currently looking into it and > trying multiple options, if I came across anything, will keep you posted. > > @Vinoth > As suggested, I have whitelisted the parquet dependency in the above mention > pom file, still got the same error message. > > As of now, I am directly using Hudi-Utilities jar, there is no separate > project or my own project. > Also, with respect to Hive, I am aware of the same, as of now I have share > the Hive URL of 1.x which is used by CDH in config. > > Again attaching the logs for reference. > > Regards, > Shahida R. Khan > > > > > > > > > > On Tue, Oct 15, 2019 at 9:22 PM Vinoth Chandar <[email protected] > (mailto:[email protected])> wrote: > > > I have added both the dependency and tried too. > > If you are trying to get the hudi-utilities bundle to include a jar, then > > you also need to whitelist it explicitly here > > https://github.com/apache/incubator-hudi/blob/master/packaging/hudi-utilities-bundle/pom.xml#L67 > > > > > > Heads up : you may hit issues with Hive since CDH hive is still 1.x (world > > is moving to Hive 3+ slowly, all other cloud/distro vendors are on Hive > > 2.x). > > > > On Tue, Oct 15, 2019 at 8:33 AM Kabeer Ahmed <[email protected] > > (mailto:[email protected])> wrote: > > > Shahida > > > > > > Thanks for trying out various options. I do not work on the CDH platform. > > > So I am hoping that someone with Cloudera platform will help you with the > > > jar. If I remember right there are CDH jars that are deployed under > > > /etc/cloudera/cdh/version and you should be able to find the jar path to > > > include on your classpath. I can also see that this file is referenced in > > > cloudera documentation at: > > > https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_ig_parquet.html > > > ( > > > https://link.getmailspring.com/link/[email protected]/0?redirect=https%3A%2F%2Fdocs.cloudera.com%2Fdocumentation%2Fenterprise%2F5-14-x%2Ftopics%2Fcdh_ig_parquet.html&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D) > > > (this is the same CDH 5.14 that you are using). Please try to search for > > > the jar with the name: parquet-hadoop-1.8.*.jar. > > > On another thread (https://github.com/bigdatagenomics/adam/issues/1742 ( > > > https://link.getmailspring.com/link/[email protected]/1?redirect=https%3A%2F%2Fgithub.com%2Fbigdatagenomics%2Fadam%2Fissues%2F1742&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)) > > > I also see an option of using: --packages > > > org.apache.parquet:parquet-hadoop:1.8.2. > > > If nothing works then we can use the brute force method of downloading the > > > jar manually from the link below and placing it on the classpath. > > > https://mvnrepository.com/artifact/org.apache.parquet/parquet-hadoop/1.8.3 > > > ( > > > https://link.getmailspring.com/link/[email protected]/2?redirect=https%3A%2F%2Fmvnrepository.com%2Fartifact%2Forg.apache.parquet%2Fparquet-hadoop%2F1.8.3&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D > > > ) > > > > > > Can you please try these options and revert back with your observations? > > > I do understand why parquet-hadoop dependency in the pom.xml didnt do the > > > magic because it seems hudi code doesnt have a real dependency on this. I > > > still need to see the dependency tree if this jar is included through > > > parquet-avro. > > > There are loads of users from companies like Uber on this thread who use > > > hudi on CDH. Someone must have a solution for your issue. > > > On Oct 15 2019, at 2:51 pm, Shahida Khan <[email protected] > > > (mailto:[email protected]).INVALID> > > > wrote: > > > > Hi Kabeer, > > > > > > > > I have added both the dependency and tried too. > > > > Just a version change, I have used *parquet-hadoop 1.8.1 *since > > > *parquet-avro > > > > *is* 1.8.1.* > > > > *Looks like this * > > > > > > > > * <parquet.version>1.8.1</parquet.version>* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *<dependency><groupId>org.apache.parquet</groupId><artifactId>parquet-avro</artifactId><version>${parquet.version}</version><!-- > > > > <scope>provided</scope> > > > > > > > --></dependency><dependency><groupId>org.apache.parquet</groupId><artifactId>parquet-hadoop</artifactId><version> > > > > ${parquet.version} </version></dependency> * > > > > > > > > > > > > Regards, > > > > *Shahida R. Khan* > > > > > > > > > > > > On Tue, Oct 15, 2019 at 7:12 PM Kabeer Ahmed <[email protected] > > > > (mailto:[email protected])> > > > wrote: > > > > > Thank you Shahida. Can you please confirm that you have included both > > > the > > > > > below dependencies and tried the build? > > > > > > > > > > If your build is missing parquet-hadoop, then the required class may > > > not > > > > > be found. If you have already included the below dependencies and > > > still it > > > > > doesnt work, I can upload a jar for you to try. > > > > > <dependency> > > > > > <groupId>org.apache.parquet</groupId> > > > > > <artifactId>parquet-avro</artifactId> > > > > > <version>${parquet.version}</version> > > > > > <scope>provided</scope> > > > > > </dependency> > > > > > > > > > > <dependency> > > > > > <groupId>org.apache.parquet</groupId> > > > > > <artifactId>parquet-hadoop</artifactId> > > > > > <version>1.8.3</version> > > > > > </dependency> > > > > > On Oct 15 2019, at 2:28 pm, Shahida Khan <[email protected] > > > > > (mailto:[email protected]) > > > .INVALID> > > > > > wrote: > > > > > > Hi Kabeer, > > > > > > > > > > > > Thank you for quick response! > > > > > > Also, our project already include the below dependency, I believe > > > this > > > > > > > > > > should include "org.apache.parquet.parquet-hadoop" > > > > > > > > > > > > > > > > > > <dependency> > > > > > > <groupId>org.apache.parquet</groupId> > > > > > > <artifactId>parquet-avro</artifactId> > > > > > > <version>${parquet.version}</version> > > > > > > </dependency> > > > > > > > > > > > > > > > > > > I have even checked the ```jar -tvf shahida.jar | grep -i > > > > > CompressionCodecName``` class is not available in the jar even after > > > > > including in build. > > > > > > > > > > > > Strange is, I have even provided the parquet-avro jar via > > > spark-submit, > > > > > and it behave differently for 1.7 and 1.8 > > > > > > Seems like there is some configuration missing with respect to > > > > > > > > > > HoodieStorageConfig.PARQUET_COMPRESSION_CODEC. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > Shahida R. Khan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Oct 15, 2019 at 4:24 PM Kabeer Ahmed <[email protected] > > > > > > (mailto:[email protected]) > > > > > (mailto:[email protected])> wrote: > > > > > > > Shahida > > > > > > > > > > > > > > Welcome to Hudi. I am not an expert with DeltaStreamer as I do not > > > use > > > > > it. In general, I think this points to the issue with build of the fat > > > jar. > > > > > This looks to me that either you didnt build the fat jar to include > > > all the > > > > > dependencies or your class path didnt include the jar needed. > > > > > > > For some reason I didnt receive the full stack trace attachment. > > > > > > > > > > > > > > > > Either you forgot to attach it or mail system blocked it. > > > > > > > Can you please check: > > > > > > > That your pom has dependency shown as below: > > > > > > > <!-- > > > > > > > > > > > > > > > > https://mvnrepository.com/artifact/org.apache.parquet/parquet-hadoop > > > --> > > > > > > > <dependency> > > > > > > > <groupId>org.apache.parquet</groupId> > > > > > > > <artifactId>parquet-hadoop</artifactId> > > > > > > > <version>1.8.3</version> > > > > > > > </dependency> > > > > > > > > > > > > > > Can you also run ```jar -tvf shahida.jar | grep -i > > > > > CompressionCodecName ``` and let us know the output that you see. > > > > > > > Once we have the answers to the above, we can see what is missing > > > and > > > > > > > > > > > > > > > > address that hopefully. > > > > > > > Kabeer. > > > > > > > On Oct 15 2019, at 10:39 am, Shahida Khan <[email protected] > > > > > > > (mailto:[email protected]) > > > > > > > > > > > > > > > > (mailto:[email protected])> wrote: > > > > > > > > Hi All, > > > > > > > > > > > > > > > > Hope you are doing well. > > > > > > > > I am currently trying to implement the Hudi Utilities using > > > > > > > > Delta > > > > > > > > > > > > > > > > > > > > > > > Streamer. Below is the command line configuration I am passing > > > > > > > > > > > > > > > > spark2-submit --master yarn --deploy-mode cluster --class > > > > > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer > > > > > /tmp/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --props > > > > > /user/oozie/dataops/hoodie/config.properties --schemaprovider-class > > > > > org.apache.hudi.utilities.schema.SchemaRegistryProvider --source-class > > > > > org.apache.hudi.utilities.sources.AvroKafkaSource > > > --source-ordering-field > > > > > LastModified_dtmStamp > > > > > > > > --target-base-path /tmp/hudi-deltastreamer-op_TEST > > > > > > > > --target-table > > > > > > > > > > > > > > > > > > > > > > > testTableHoodie --op UPSERT --enable-hive-sync --storage-type > > > MERGE_ON_READ > > > > > > > > > > > > > > > > Also, have attached the config file too. > > > > > > > > Unfortunately, while writing the files in parquet, it throws an > > > > > exception as "java.lang.NoClassDefFoundError: > > > > > org/apache/parquet/hadoop/metadata/CompressionCodecName" > > > > > > > > Full Error Trace has been attached for your reference. > > > > > > > > > > > > > > > > There are few warnings with respect to configuration but not > > > sure if > > > > > that's the problem. > > > > > > > > > > > > > > > > I have tried giving the classpath as well. I am not sure what i > > > am > > > > > missing here. > > > > > > > > It would be great if anybody could help me here. > > > > > > > > > > > > > > > > Hadoop version :- 2.6.0-cdh5.14.2 > > > > > > > > Spark version :- 2.3.0.cloudera2 > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > Shahida R. Khan > > > > > > > > +91 9167538366 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The information contained in this transmission may contain > > > > > > privileged > > > > > and confidential information of Big Tree Entertainment Pvt Ltd, > > > including > > > > > information protected by privacy laws. It is intended only for the use > > > of > > > > > Big Tree Entertainment Pvt Ltd. If you are not the intended recipient, > > > you > > > > > are hereby notified that any review, dissemination, distribution, or > > > > > duplication of this communication is strictly prohibited. If you are > > > not > > > > > the intended recipient, please contact the sender by reply email and > > > > > destroy all copies of the original message. Although Big Tree > > > Entertainment > > > > > Pvt Ltd. has taken reasonable precautions to ensure no viruses are > > > present > > > > > in this email, Big Tree Entertainment Pvt Ltd. cannot accept > > > responsibility > > > > > for any loss or damage arising from the use of this email or > > > attachments. > > > > > Computer viruses can be transmitted via email. Recipient should check > > > the > > > > > email and any attachments for the presence of viruses before using > > > them. > > > > > Any views or opinions are solely those of th > > > > > e author and do not necessarily represent those of Big Tree > > > Entertainment > > > > > Pvt Ltd. > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The information contained in this transmission may contain > > > > privileged and confidential information of Big Tree Entertainment Pvt > > > Ltd, > > > > including information protected by privacy laws. It is intended only for > > > > the use of Big Tree Entertainment Pvt Ltd. If you are not the intended > > > > recipient, you are hereby notified that any review, dissemination, > > > > distribution, or duplication of this communication is strictly > > > prohibited. > > > > If you are not the intended recipient, please contact the sender by > > > > reply > > > > email and destroy all copies of the original message. Although Big Tree > > > > Entertainment Pvt Ltd. has taken reasonable precautions to ensure no > > > > viruses are present in this email, Big Tree Entertainment Pvt Ltd. > > > > cannot > > > > accept responsibility for any loss or damage arising from the use of > > > > this > > > > email or attachments. Computer viruses can be transmitted via email. > > > > Recipient should check the email and any attachments for the presence of > > > > viruses before using them. Any views or opinions are solely those of the > > > > author and do not necessarily represent those of Big Tree Entertainment > > > Pvt > > > > Ltd. > > > > > > > > > > > > > > The information contained in this transmission may contain privileged and > confidential information of Big Tree Entertainment Pvt Ltd, including > information protected by privacy laws. It is intended only for the use of Big > Tree Entertainment Pvt Ltd. If you are not the intended recipient, you are > hereby notified that any review, dissemination, distribution, or duplication > of this communication is strictly prohibited. If you are not the intended > recipient, please contact the sender by reply email and destroy all copies of > the original message. Although Big Tree Entertainment Pvt Ltd. has taken > reasonable precautions to ensure no viruses are present in this email, Big > Tree Entertainment Pvt Ltd. cannot accept responsibility for any loss or > damage arising from the use of this email or attachments. Computer viruses > can be transmitted via email. Recipient should check the email and any > attachments for the presence of viruses before using them. Any views or > opinions are solely those of th e author and do not necessarily represent those of Big Tree Entertainment Pvt Ltd.
