Shahida

Thanks for trying out various options. I do not work on the CDH platform. So I 
am hoping that someone with Cloudera platform will help you with the jar. If I 
remember right there are CDH jars that are deployed under 
/etc/cloudera/cdh/version and you should be able to find the jar path to 
include on your classpath. I can also see that this file is referenced in 
cloudera documentation at: 
https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_ig_parquet.html
 
(https://link.getmailspring.com/link/[email protected]/0?redirect=https%3A%2F%2Fdocs.cloudera.com%2Fdocumentation%2Fenterprise%2F5-14-x%2Ftopics%2Fcdh_ig_parquet.html&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)
 (this is the same CDH 5.14 that you are using). Please try to search for the 
jar with the name: parquet-hadoop-1.8.*.jar.
On another thread (https://github.com/bigdatagenomics/adam/issues/1742 
(https://link.getmailspring.com/link/[email protected]/1?redirect=https%3A%2F%2Fgithub.com%2Fbigdatagenomics%2Fadam%2Fissues%2F1742&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D))
 I also see an option of using: --packages 
org.apache.parquet:parquet-hadoop:1.8.2.
If nothing works then we can use the brute force method of downloading the jar 
manually from the link below and placing it on the classpath.
https://mvnrepository.com/artifact/org.apache.parquet/parquet-hadoop/1.8.3 
(https://link.getmailspring.com/link/[email protected]/2?redirect=https%3A%2F%2Fmvnrepository.com%2Fartifact%2Forg.apache.parquet%2Fparquet-hadoop%2F1.8.3&recipient=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)

Can you please try these options and revert back with your observations?
I do understand why parquet-hadoop dependency in the pom.xml didnt do the magic 
because it seems hudi code doesnt have a real dependency on this. I still need 
to see the dependency tree if this jar is included through parquet-avro.
There are loads of users from companies like Uber on this thread who use hudi 
on CDH. Someone must have a solution for your issue.
On Oct 15 2019, at 2:51 pm, Shahida Khan <[email protected]> 
wrote:
> Hi Kabeer,
>
> I have added both the dependency and tried too.
> Just a version change, I have used *parquet-hadoop 1.8.1 *since *parquet-avro
> *is* 1.8.1.*
> *Looks like this *
>
> * <parquet.version>1.8.1</parquet.version>*
>
>
>
>
>
>
>
>
>
>
>
> *<dependency><groupId>org.apache.parquet</groupId><artifactId>parquet-avro</artifactId><version>${parquet.version}</version><!--
> <scope>provided</scope>
> --></dependency><dependency><groupId>org.apache.parquet</groupId><artifactId>parquet-hadoop</artifactId><version>
> ${parquet.version} </version></dependency> *
>
>
> Regards,
> *Shahida R. Khan*
>
>
> On Tue, Oct 15, 2019 at 7:12 PM Kabeer Ahmed <[email protected]> wrote:
> > Thank you Shahida. Can you please confirm that you have included both the
> > below dependencies and tried the build?
> >
> > If your build is missing parquet-hadoop, then the required class may not
> > be found. If you have already included the below dependencies and still it
> > doesnt work, I can upload a jar for you to try.
> > <dependency>
> > <groupId>org.apache.parquet</groupId>
> > <artifactId>parquet-avro</artifactId>
> > <version>${parquet.version}</version>
> > <scope>provided</scope>
> > </dependency>
> >
> > <dependency>
> > <groupId>org.apache.parquet</groupId>
> > <artifactId>parquet-hadoop</artifactId>
> > <version>1.8.3</version>
> > </dependency>
> > On Oct 15 2019, at 2:28 pm, Shahida Khan 
> > <[email protected]>
> > wrote:
> > > Hi Kabeer,
> > >
> > > Thank you for quick response!
> > > Also, our project already include the below dependency, I believe this
> >
> > should include "org.apache.parquet.parquet-hadoop"
> > >
> > >
> > > <dependency>
> > > <groupId>org.apache.parquet</groupId>
> > > <artifactId>parquet-avro</artifactId>
> > > <version>${parquet.version}</version>
> > > </dependency>
> > >
> > >
> > > I have even checked the ```jar -tvf shahida.jar | grep -i
> > CompressionCodecName``` class is not available in the jar even after
> > including in build.
> > >
> > > Strange is, I have even provided the parquet-avro jar via spark-submit,
> > and it behave differently for 1.7 and 1.8
> > > Seems like there is some configuration missing with respect to
> >
> > HoodieStorageConfig.PARQUET_COMPRESSION_CODEC.
> > >
> > >
> > >
> > >
> > > Regards,
> > > Shahida R. Khan
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Oct 15, 2019 at 4:24 PM Kabeer Ahmed <[email protected]
> > (mailto:[email protected])> wrote:
> > > > Shahida
> > > >
> > > > Welcome to Hudi. I am not an expert with DeltaStreamer as I do not use
> > it. In general, I think this points to the issue with build of the fat jar.
> > This looks to me that either you didnt build the fat jar to include all the
> > dependencies or your class path didnt include the jar needed.
> > > > For some reason I didnt receive the full stack trace attachment.
> > >
> >
> > Either you forgot to attach it or mail system blocked it.
> > > > Can you please check:
> > > > That your pom has dependency shown as below:
> > > > <!--
> > >
> >
> > https://mvnrepository.com/artifact/org.apache.parquet/parquet-hadoop -->
> > > > <dependency>
> > > > <groupId>org.apache.parquet</groupId>
> > > > <artifactId>parquet-hadoop</artifactId>
> > > > <version>1.8.3</version>
> > > > </dependency>
> > > >
> > > > Can you also run ```jar -tvf shahida.jar | grep -i
> > CompressionCodecName ``` and let us know the output that you see.
> > > > Once we have the answers to the above, we can see what is missing and
> > >
> >
> > address that hopefully.
> > > > Kabeer.
> > > > On Oct 15 2019, at 10:39 am, Shahida Khan <[email protected]
> > >
> >
> > (mailto:[email protected])> wrote:
> > > > > Hi All,
> > > > >
> > > > > Hope you are doing well.
> > > > > I am currently trying to implement the Hudi Utilities using Delta
> > > >
> > >
> >
> > Streamer. Below is the command line configuration I am passing
> > > > >
> > > > > spark2-submit --master yarn --deploy-mode cluster --class
> > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
> > /tmp/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --props
> > /user/oozie/dataops/hoodie/config.properties --schemaprovider-class
> > org.apache.hudi.utilities.schema.SchemaRegistryProvider --source-class
> > org.apache.hudi.utilities.sources.AvroKafkaSource --source-ordering-field
> > LastModified_dtmStamp
> > > > > --target-base-path /tmp/hudi-deltastreamer-op_TEST --target-table
> > > >
> > >
> >
> > testTableHoodie --op UPSERT --enable-hive-sync --storage-type MERGE_ON_READ
> > > > >
> > > > > Also, have attached the config file too.
> > > > > Unfortunately, while writing the files in parquet, it throws an
> > exception as "java.lang.NoClassDefFoundError:
> > org/apache/parquet/hadoop/metadata/CompressionCodecName"
> > > > > Full Error Trace has been attached for your reference.
> > > > >
> > > > > There are few warnings with respect to configuration but not sure if
> > that's the problem.
> > > > >
> > > > > I have tried giving the classpath as well. I am not sure what i am
> > missing here.
> > > > > It would be great if anybody could help me here.
> > > > >
> > > > > Hadoop version :- 2.6.0-cdh5.14.2
> > > > > Spark version :- 2.3.0.cloudera2
> > > > >
> > > > >
> > > > > Regards,
> > > > > Shahida R. Khan
> > > > > +91 9167538366
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > > The information contained in this transmission may contain privileged
> > and confidential information of Big Tree Entertainment Pvt Ltd, including
> > information protected by privacy laws. It is intended only for the use of
> > Big Tree Entertainment Pvt Ltd. If you are not the intended recipient, you
> > are hereby notified that any review, dissemination, distribution, or
> > duplication of this communication is strictly prohibited. If you are not
> > the intended recipient, please contact the sender by reply email and
> > destroy all copies of the original message. Although Big Tree Entertainment
> > Pvt Ltd. has taken reasonable precautions to ensure no viruses are present
> > in this email, Big Tree Entertainment Pvt Ltd. cannot accept responsibility
> > for any loss or damage arising from the use of this email or attachments.
> > Computer viruses can be transmitted via email. Recipient should check the
> > email and any attachments for the presence of viruses before using them.
> > Any views or opinions are solely those of th
> > e author and do not necessarily represent those of Big Tree Entertainment
> > Pvt Ltd.
> >
>
> --
>
>
>
>
>
>
>
>
>
> The information contained in this transmission may contain
> privileged and confidential information of Big Tree Entertainment Pvt Ltd,
> including information protected by privacy laws. It is intended only for
> the use of Big Tree Entertainment Pvt Ltd. If you are not the intended
> recipient, you are hereby notified that any review, dissemination,
> distribution, or duplication of this communication is strictly prohibited.
> If you are not the intended recipient, please contact the sender by reply
> email and destroy all copies of the original message. Although Big Tree
> Entertainment Pvt Ltd. has taken reasonable precautions to ensure no
> viruses are present in this email, Big Tree Entertainment Pvt Ltd. cannot
> accept responsibility for any loss or damage arising from the use of this
> email or attachments. Computer viruses can be transmitted via email.
> Recipient should check the email and any attachments for the presence of
> viruses before using them. Any views or opinions are solely those of the
> author and do not necessarily represent those of Big Tree Entertainment Pvt
> Ltd.
>

Reply via email to