umehrot2 commented on a change in pull request #873: [HUDI-159] Redesigning
bundles for lighter-weight integrations
URL: https://github.com/apache/incubator-hudi/pull/873#discussion_r320543511
##########
File path: packaging/hudi-hive-bundle/pom.xml
##########
@@ -169,53 +50,22 @@
</goals>
<configuration>
<createSourcesJar>true</createSourcesJar>
- <relocations>
- <relocation>
- <pattern>com.beust.</pattern>
- <shadedPattern>org.apache.hudi.com.beust.</shadedPattern>
- </relocation>
- <relocation>
- <pattern>org.joda.</pattern>
- <shadedPattern>org.apache.hudi.org.joda.</shadedPattern>
- </relocation>
- <relocation>
- <pattern>com.google.</pattern>
- <shadedPattern>org.apache.hudi.com.google.</shadedPattern>
- </relocation>
- <relocation>
- <pattern>org.slf4j.</pattern>
- <shadedPattern>org.apache.hudi.org.slf4j.</shadedPattern>
- </relocation>
- <relocation>
- <pattern>org.apache.commons.</pattern>
-
<shadedPattern>org.apache.hudi.org.apache.commons.</shadedPattern>
- </relocation>
- <relocation>
- <pattern>parquet.column</pattern>
-
<shadedPattern>org.apache.hudi.parquet.column</shadedPattern>
- </relocation>
- <relocation>
- <pattern>parquet.format.</pattern>
-
<shadedPattern>org.apache.hudi.parquet.format.</shadedPattern>
- </relocation>
- <relocation>
- <pattern>parquet.hadoop.</pattern>
-
<shadedPattern>org.apache.hudi.parquet.hadoop.</shadedPattern>
- </relocation>
- <relocation>
- <pattern>parquet.schema.</pattern>
-
<shadedPattern>org.apache.hudi.parquet.schema.</shadedPattern>
- </relocation>
- </relocations>
- <createDependencyReducedPom>false</createDependencyReducedPom>
+
<dependencyReducedPomLocation>${project.build.directory}/dependency-reduced-pom.xml
+ </dependencyReducedPomLocation>
<artifactSet>
- <excludes>
- <exclude>log4j:log4j</exclude>
- <exclude>org.apache.hadoop:*</exclude>
- <exclude>org.apache.hive:*</exclude>
- <exclude>org.apache.derby:derby</exclude>
- </excludes>
+ <includes>
+ <include>org.apache.hudi:hudi-common</include>
+ <include>org.apache.hudi:hudi-hadoop-mr</include>
+ <include>org.apache.hudi:hudi-hive</include>
+
+ <include>com.beust:jcommander</include>
+ <include>org.apache.parquet:parquet-avro</include>
Review comment:
`parquet-avro`although you have included it for shading, is not ending up in
the jar. Its because it is being pulled in as `provided` dependency even though
its coming via `hudi-hadoop-mr-bundle` where you have marked it as `compile`
time. Its probably since its still picking up the `scope` from parent pom where
it is `provided`.
This is causing ClassNotFound exception while querying `Real Time` table in
Hive:
```
hive> select record_key from catalog_sales_part_mor_050_sep3_1_rt limit 10;
OK
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/parquet/avro/AvroSchemaConverter
at
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:323)
at
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.<init>(AbstractRealtimeRecordReader.java:105)
at
org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.<init>(RealtimeCompactedRecordReader.java:48)
at
org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:67)
at
org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.<init>(HoodieRealtimeRecordReader.java:45)
at
org.apache.hudi.hadoop.realtime.HoodieRealtimeInputFormat.getRecordReader(HoodieRealtimeInputFormat.java:234)
at
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
at
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
at
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208)
at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
Caused by: java.lang.ClassNotFoundException:
org.apache.parquet.avro.AvroSchemaConverter
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 24 more
```
I was able to get around this by declaring it as a compile time dependency
in this POM.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services