Thank you Shahida. Can you please confirm that you have included both the below
dependencies and tried the build?
If your build is missing parquet-hadoop, then the required class may not be
found. If you have already included the below dependencies and still it doesnt
work, I can upload a jar for you to try.
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-avro</artifactId>
<version>${parquet.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-hadoop</artifactId>
<version>1.8.3</version>
</dependency>
On Oct 15 2019, at 2:28 pm, Shahida Khan <[email protected]>
wrote:
> Hi Kabeer,
>
> Thank you for quick response!
> Also, our project already include the below dependency, I believe this should
> include "org.apache.parquet.parquet-hadoop"
>
>
> <dependency>
> <groupId>org.apache.parquet</groupId>
> <artifactId>parquet-avro</artifactId>
> <version>${parquet.version}</version>
> </dependency>
>
>
> I have even checked the ```jar -tvf shahida.jar | grep -i
> CompressionCodecName``` class is not available in the jar even after
> including in build.
>
> Strange is, I have even provided the parquet-avro jar via spark-submit, and
> it behave differently for 1.7 and 1.8
> Seems like there is some configuration missing with respect to
> HoodieStorageConfig.PARQUET_COMPRESSION_CODEC.
>
>
>
>
> Regards,
> Shahida R. Khan
>
>
>
>
>
>
>
> On Tue, Oct 15, 2019 at 4:24 PM Kabeer Ahmed <[email protected]
> (mailto:[email protected])> wrote:
> > Shahida
> >
> > Welcome to Hudi. I am not an expert with DeltaStreamer as I do not use it.
> > In general, I think this points to the issue with build of the fat jar.
> > This looks to me that either you didnt build the fat jar to include all the
> > dependencies or your class path didnt include the jar needed.
> > For some reason I didnt receive the full stack trace attachment. Either you
> > forgot to attach it or mail system blocked it.
> > Can you please check:
> > That your pom has dependency shown as below:
> > <!-- https://mvnrepository.com/artifact/org.apache.parquet/parquet-hadoop
> > -->
> > <dependency>
> > <groupId>org.apache.parquet</groupId>
> > <artifactId>parquet-hadoop</artifactId>
> > <version>1.8.3</version>
> > </dependency>
> >
> > Can you also run ```jar -tvf shahida.jar | grep -i CompressionCodecName ```
> > and let us know the output that you see.
> > Once we have the answers to the above, we can see what is missing and
> > address that hopefully.
> > Kabeer.
> > On Oct 15 2019, at 10:39 am, Shahida Khan <[email protected]
> > (mailto:[email protected])> wrote:
> > > Hi All,
> > >
> > > Hope you are doing well.
> > > I am currently trying to implement the Hudi Utilities using Delta
> > > Streamer. Below is the command line configuration I am passing
> > >
> > > spark2-submit --master yarn --deploy-mode cluster --class
> > > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
> > > /tmp/hudi-utilities-bundle-0.5.1-SNAPSHOT.jar --props
> > > /user/oozie/dataops/hoodie/config.properties --schemaprovider-class
> > > org.apache.hudi.utilities.schema.SchemaRegistryProvider --source-class
> > > org.apache.hudi.utilities.sources.AvroKafkaSource --source-ordering-field
> > > LastModified_dtmStamp
> > > --target-base-path /tmp/hudi-deltastreamer-op_TEST --target-table
> > > testTableHoodie --op UPSERT --enable-hive-sync --storage-type
> > > MERGE_ON_READ
> > >
> > > Also, have attached the config file too.
> > >
> > > Unfortunately, while writing the files in parquet, it throws an exception
> > > as "java.lang.NoClassDefFoundError:
> > > org/apache/parquet/hadoop/metadata/CompressionCodecName"
> > > Full Error Trace has been attached for your reference.
> > >
> > > There are few warnings with respect to configuration but not sure if
> > > that's the problem.
> > >
> > > I have tried giving the classpath as well. I am not sure what i am
> > > missing here.
> > > It would be great if anybody could help me here.
> > >
> > > Hadoop version :- 2.6.0-cdh5.14.2
> > > Spark version :- 2.3.0.cloudera2
> > >
> > >
> > > Regards,
> > > Shahida R. Khan
> > > +91 9167538366
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
>
> The information contained in this transmission may contain privileged and
> confidential information of Big Tree Entertainment Pvt Ltd, including
> information protected by privacy laws. It is intended only for the use of Big
> Tree Entertainment Pvt Ltd. If you are not the intended recipient, you are
> hereby notified that any review, dissemination, distribution, or duplication
> of this communication is strictly prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies of
> the original message. Although Big Tree Entertainment Pvt Ltd. has taken
> reasonable precautions to ensure no viruses are present in this email, Big
> Tree Entertainment Pvt Ltd. cannot accept responsibility for any loss or
> damage arising from the use of this email or attachments. Computer viruses
> can be transmitted via email. Recipient should check the email and any
> attachments for the presence of viruses before using them. Any views or
> opinions are solely those of th
e author and do not necessarily represent those of Big Tree Entertainment Pvt
Ltd.