I completely understand the Hive 1.x. Juggling multiple things atm. Let me
think more over the weekend and get back if I have a way to make Hive 1
work now.

On Thu, Oct 17, 2019 at 7:40 AM Shahida Khan <[email protected]> wrote:

> Yes Kabeer, you are absolutely right with respect to my requirements.
> Sorry for no update, was not well !!
>
> Well.!! I tried couple of options and but one of your options to forcefully
> add the jar to DeltaStreamer, it worked!
> Unfortunately, as I was expecting and vinoth said in previous email too,
> Hive 1.x gave me a bump.
> Currently working on that ..
>
>
> Regards,
> Shahida K
>
>
> On Thu, 17 Oct 2019 at 7:26 PM, Kabeer Ahmed <[email protected]> wrote:
>
> > Shahida,
> >
> > How have you been faring with the setup? Did any of the options work for
> > you?
> > If you still are out of luck, I can spin a CDH cluster in docker and
> model
> > your test case. From your emails, am I right in assuming that all you
> want
> > to do is use DeltaStreamer to ingest from Kafka into Hudi and create a
> hive
> > table on the top of that?
> > Thanks
> > Kabeer.
> >
> > On Oct 16 2019, at 9:19 am, Shahida Khan <[email protected]
> .INVALID>
> > wrote:
> > > Thank Kabeer for helping me with doc, I am currently looking into it
> and
> > trying multiple options, if I came across anything, will keep you posted.
> > >
> > > @Vinoth
> > > As suggested, I have whitelisted the parquet dependency in the above
> > mention pom file, still got the same error message.
> > >
> > > As of now, I am directly using Hudi-Utilities jar, there is no separate
> > project or my own project.
> > > Also, with respect to Hive, I am aware of the same, as of now I have
> > share the Hive URL of 1.x which is used by CDH in config.
> > >
> > > Again attaching the logs for reference.
> > >
> > > Regards,
> > > Shahida R. Khan
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Oct 15, 2019 at 9:22 PM Vinoth Chandar <[email protected]
> > (mailto:[email protected])> wrote:
> > > > > I have added both the dependency and tried too.
> > > > If you are trying to get the hudi-utilities bundle to include a jar,
> > then
> > > > you also need to whitelist it explicitly here
> > > >
> >
> https://github.com/apache/incubator-hudi/blob/master/packaging/hudi-utilities-bundle/pom.xml#L67
> > > >
> > > >
> > > > Heads up : you may hit issues with Hive since CDH hive is still 1.x
> > (world
> > > > is moving to Hive 3+ slowly, all other cloud/distro vendors are on
> Hive
> > > > 2.x).
> > > >
> > > > On Tue, Oct 15, 2019 at 8:33 AM Kabeer Ahmed <[email protected]
> > (mailto:[email protected])> wrote:
> > > > > Shahida
> > > > >
> > > > > Thanks for trying out various options. I do not work on the CDH
> > platform.
> > > > > So I am hoping that someone with Cloudera platform will help you
> > with the
> > > > > jar. If I remember right there are CDH jars that are deployed under
> > > > > /etc/cloudera/cdh/version and you should be able to find the jar
> > path to
> > > > > include on your classpath. I can also see that this file is
> > referenced in
> > > > > cloudera documentation at:
> > > > >
> >
> https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_ig_parquet.html
> > > > > (
> > > > >
> >
> https://link.getmailspring.com/link/[email protected]/0?redirect=https%3A%2F%2Fdocs.cloudera.com%2Fdocumentation%2Fenterprise%2F5-14-x%2Ftopics%2Fcdh_ig_parquet.html&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> > )
> > > > > (this is the same CDH 5.14 that you are using). Please try to
> search
> > for
> > > > > the jar with the name: parquet-hadoop-1.8.*.jar.
> > > > > On another thread (
> > https://github.com/bigdatagenomics/adam/issues/1742 (
> > > > >
> >
> https://link.getmailspring.com/link/[email protected]/1?redirect=https%3A%2F%2Fgithub.com%2Fbigdatagenomics%2Fadam%2Fissues%2F1742&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> > ))
> > > > > I also see an option of using: --packages
> > > > > org.apache.parquet:parquet-hadoop:1.8.2.
> > > > > If nothing works then we can use the brute force method of
> > downloading the
> > > > > jar manually from the link below and placing it on the classpath.
> > > > >
> >
> https://mvnrepository.com/artifact/org.apache.parquet/parquet-hadoop/1.8.3
> > > > > (
> > > > >
> >
> https://link.getmailspring.com/link/[email protected]/2?redirect=https%3A%2F%2Fmvnrepository.com%2Fartifact%2Forg.apache.parquet%2Fparquet-hadoop%2F1.8.3&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> > > > > )
> > > > >
> > > > > Can you please try these options and revert back with your
> > observations?
> > > > > I do understand why parquet-hadoop dependency in the pom.xml didnt
> > do the
> > > > > magic because it seems hudi code doesnt have a real dependency on
> > this. I
> > > > > still need to see the dependency tree if this jar is included
> through
> > > > > parquet-avro.
> > > > > There are loads of users from companies like Uber on this thread
> who
> > use
> > > > > hudi on CDH. Someone must have a solution for your issue.
> > > > > On Oct 15 2019, at 2:51 pm, Shahida Khan <
> > [email protected] (mailto:[email protected]
> ).INVALID>
> > > > > wrote:
> > > > > > Hi Kabeer,
> > > > > >
> > > > > > I have added both the dependency and tried too.
> > > > > > Just a version change, I have used *parquet-hadoop 1.8.1 *since
> > > > > *parquet-avro
> > > > > > *is* 1.8.1.*
> > > > > > *Looks like this *
> > > > > >
> > > > > > * <parquet.version>1.8.1</parquet.version>*
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> >
> *<dependency><groupId>org.apache.parquet</groupId><artifactId>parquet-avro</artifactId><version>${parquet.version}</version><!--
> > > > > > <scope>provided</scope>
> > > > > >
> > > > >
> >
> --></dependency><dependency><groupId>org.apache.parquet</groupId><artifactId>parquet-hadoop</artifactId><version>
> > > > > > ${parquet.version} </version></dependency> *
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > *Shahida R. Khan*
> > > > > >
> > > > > >
> > > > > > On Tue, Oct 15, 2019 at 7:12 PM Kabeer Ahmed <
> [email protected]
> > (mailto:[email protected])>
> > > > > wrote:
> > > > > > > Thank you Shahida. Can you please confirm that you have
> included
> > both
> > > > > the
> > > > > > > below dependencies and tried the build?
> > > > > > >
> > > > > > > If your build is missing parquet-hadoop, then the required
> class
> > may
> > > > > not
> > > > > > > be found. If you have already included the below dependencies
> and
> > > > > still it
> > > > > > > doesnt work, I can upload a jar for you to try.
> > > > > > > <dependency>
> > > > > > > <groupId>org.apache.parquet</groupId>
> > > > > > > <artifactId>parquet-avro</artifactId>
> > > > > > > <version>${parquet.version}</version>
> > > > > > > <scope>provided</scope>
> > > > > > > </dependency>
> > > > > > >
> > > > > > > <dependency>
> > > > > > > <groupId>org.apache.parquet</groupId>
> > > > > > > <artifactId>parquet-hadoop</artifactId>
> > > > > > > <version>1.8.3</version>
> > > > > > > </dependency>
> > > > > > > On Oct 15 2019, at 2:28 pm, Shahida Khan <
> > [email protected] (mailto:[email protected])
> > > > > .INVALID>
> > > > > > > wrote:
> > > > > > > > Hi Kabeer,
> > > > > > > >
> > > > > > > > Thank you for quick response!
> > > > > > > > Also, our project already include the below dependency, I
> > believe
> > > > > this
> > > > > > >
> > > > > > > should include "org.apache.parquet.parquet-hadoop"
> > > > > > > >
> > > > > > > >
> > > > > > > > <dependency>
> > > > > > > > <groupId>org.apache.parquet</groupId>
> > > > > > > > <artifactId>parquet-avro</artifactId>
> > > > > > > > <version>${parquet.version}</version>
> > > > > > > > </dependency>
> > > > > > > >
> > > > > > > >
> > > > > > > > I have even checked the ```jar -tvf shahida.jar | grep -i
> > > > > > > CompressionCodecName``` class is not available in the jar even
> > after
> > > > > > > including in build.
> > > > > > > >
> > > > > > > > Strange is, I have even provided the parquet-avro jar via
> > > > > spark-submit,
> > > > > > > and it behave differently for 1.7 and 1.8
> > > > > > > > Seems like there is some configuration missing with respect
> to
> > > > > > >
> > > > > > > HoodieStorageConfig.PARQUET_COMPRESSION_CODEC.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Shahida R. Khan
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Oct 15, 2019 at 4:24 PM Kabeer Ahmed <
> > [email protected] (mailto:[email protected])
> > > > > > > (mailto:[email protected])> wrote:
> > > > > > > > > Shahida
> > > > > > > > >
> > > > > > > > > Welcome to Hudi. I am not an expert with DeltaStreamer as I
> > do not
> > > > > use
> > > > > > > it. In general, I think this points to the issue with build of
> > the fat
> > > > > jar.
> > > > > > > This looks to me that either you didnt build the fat jar to
> > include
> > > > > all the
> > > > > > > dependencies or your class path didnt include the jar needed.
> > > > > > > > > For some reason I didnt receive the full stack trace
> > attachment.
> > > > > > > >
> > > > > > >
> > > > > > > Either you forgot to attach it or mail system blocked it.
> > > > > > > > > Can you please check:
> > > > > > > > > That your pom has dependency shown as below:
> > > > > > > > > <!--
> > > > > > > >
> > > > > > >
> > > > > > >
> > https://mvnrepository.com/artifact/org.apache.parquet/parquet-hadoop
> > > > > -->
> > > > > > > > > <dependency>
> > > > > > > > > <groupId>org.apache.parquet</groupId>
> > > > > > > > > <artifactId>parquet-hadoop</artifactId>
> > > > > > > > > <version>1.8.3</version>
> > > > > > > > > </dependency>
> > > > > > > > >
> > > > > > > > > Can you also run ```jar -tvf shahida.jar | grep -i
> > > > > > > CompressionCodecName ``` and let us know the output that you
> see.
> > > > > > > > > Once we have the answers to the above, we can see what is
> > missing
> > > > > and
> > > > > > > >
> > > > > > >
> > > > > > > address that hopefully.
> > > > > > > > > Kabeer.
> > > > > > > > > On Oct 15 2019, at 10:39 am, Shahida Khan <
> > [email protected] (mailto:[email protected])
> > > > > > > >
> > > > > > >
> > > > > > > (mailto:[email protected])> wrote:
> > > > > > > > > > Hi All,
> > > > > > > > > >
> > > > > > > > > > Hope you are doing well.
> > > > > > > > > > I am currently trying to implement the Hudi Utilities
> > using Delta
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > Streamer. Below is the command line configuration I am passing
> > > > > > > > > >
> > > > > > > > > > spark2-submit --master yarn --deploy-mode cluster --class
> > > > > > > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
> > > > > > > /tmp/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --props
> > > > > > > /user/oozie/dataops/hoodie/config.properties
> > --schemaprovider-class
> > > > > > > org.apache.hudi.utilities.schema.SchemaRegistryProvider
> > --source-class
> > > > > > > org.apache.hudi.utilities.sources.AvroKafkaSource
> > > > > --source-ordering-field
> > > > > > > LastModified_dtmStamp
> > > > > > > > > > --target-base-path /tmp/hudi-deltastreamer-op_TEST
> > --target-table
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > testTableHoodie --op UPSERT --enable-hive-sync --storage-type
> > > > > MERGE_ON_READ
> > > > > > > > > >
> > > > > > > > > > Also, have attached the config file too.
> > > > > > > > > > Unfortunately, while writing the files in parquet, it
> > throws an
> > > > > > > exception as "java.lang.NoClassDefFoundError:
> > > > > > > org/apache/parquet/hadoop/metadata/CompressionCodecName"
> > > > > > > > > > Full Error Trace has been attached for your reference.
> > > > > > > > > >
> > > > > > > > > > There are few warnings with respect to configuration but
> > not
> > > > > sure if
> > > > > > > that's the problem.
> > > > > > > > > >
> > > > > > > > > > I have tried giving the classpath as well. I am not sure
> > what i
> > > > > am
> > > > > > > missing here.
> > > > > > > > > > It would be great if anybody could help me here.
> > > > > > > > > >
> > > > > > > > > > Hadoop version :- 2.6.0-cdh5.14.2
> > > > > > > > > > Spark version :- 2.3.0.cloudera2
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Shahida R. Khan
> > > > > > > > > > +91 9167538366
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > The information contained in this transmission may contain
> > privileged
> > > > > > > and confidential information of Big Tree Entertainment Pvt Ltd,
> > > > > including
> > > > > > > information protected by privacy laws. It is intended only for
> > the use
> > > > > of
> > > > > > > Big Tree Entertainment Pvt Ltd. If you are not the intended
> > recipient,
> > > > > you
> > > > > > > are hereby notified that any review, dissemination,
> > distribution, or
> > > > > > > duplication of this communication is strictly prohibited. If
> you
> > are
> > > > > not
> > > > > > > the intended recipient, please contact the sender by reply
> email
> > and
> > > > > > > destroy all copies of the original message. Although Big Tree
> > > > > Entertainment
> > > > > > > Pvt Ltd. has taken reasonable precautions to ensure no viruses
> > are
> > > > > present
> > > > > > > in this email, Big Tree Entertainment Pvt Ltd. cannot accept
> > > > > responsibility
> > > > > > > for any loss or damage arising from the use of this email or
> > > > > attachments.
> > > > > > > Computer viruses can be transmitted via email. Recipient should
> > check
> > > > > the
> > > > > > > email and any attachments for the presence of viruses before
> > using
> > > > > them.
> > > > > > > Any views or opinions are solely those of th
> > > > > > > e author and do not necessarily represent those of Big Tree
> > > > > Entertainment
> > > > > > > Pvt Ltd.
> > > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > The information contained in this transmission may contain
> > > > > > privileged and confidential information of Big Tree Entertainment
> > Pvt
> > > > > Ltd,
> > > > > > including information protected by privacy laws. It is intended
> > only for
> > > > > > the use of Big Tree Entertainment Pvt Ltd. If you are not the
> > intended
> > > > > > recipient, you are hereby notified that any review,
> dissemination,
> > > > > > distribution, or duplication of this communication is strictly
> > > > > prohibited.
> > > > > > If you are not the intended recipient, please contact the sender
> > by reply
> > > > > > email and destroy all copies of the original message. Although
> Big
> > Tree
> > > > > > Entertainment Pvt Ltd. has taken reasonable precautions to ensure
> > no
> > > > > > viruses are present in this email, Big Tree Entertainment Pvt
> Ltd.
> > cannot
> > > > > > accept responsibility for any loss or damage arising from the use
> > of this
> > > > > > email or attachments. Computer viruses can be transmitted via
> > email.
> > > > > > Recipient should check the email and any attachments for the
> > presence of
> > > > > > viruses before using them. Any views or opinions are solely those
> > of the
> > > > > > author and do not necessarily represent those of Big Tree
> > Entertainment
> > > > > Pvt
> > > > > > Ltd.
> > > > > >
> > > > >
> > > > >
> > >
> > >
> > >
> > > The information contained in this transmission may contain privileged
> > and confidential information of Big Tree Entertainment Pvt Ltd, including
> > information protected by privacy laws. It is intended only for the use of
> > Big Tree Entertainment Pvt Ltd. If you are not the intended recipient,
> you
> > are hereby notified that any review, dissemination, distribution, or
> > duplication of this communication is strictly prohibited. If you are not
> > the intended recipient, please contact the sender by reply email and
> > destroy all copies of the original message. Although Big Tree
> Entertainment
> > Pvt Ltd. has taken reasonable precautions to ensure no viruses are
> present
> > in this email, Big Tree Entertainment Pvt Ltd. cannot accept
> responsibility
> > for any loss or damage arising from the use of this email or attachments.
> > Computer viruses can be transmitted via email. Recipient should check the
> > email and any attachments for the presence of viruses before using them.
> > Any views or opinions are solely those of th
> > e author and do not necessarily represent those of Big Tree Entertainment
> > Pvt Ltd.
> >
> --
> Regards,
> Shahida Rashid Khan
> 9167538366
>
>
>
>
> ****kindly ignore typo error **** Sent from handheld device ...*****
>

Reply via email to