Jonathan Vexler created HUDI-5110:
-------------------------------------
Summary: Utilities Bundle in Docker Demo is incomplete
Key: HUDI-5110
URL: https://issues.apache.org/jira/browse/HUDI-5110
Project: Apache Hudi
Issue Type: Bug
Reporter: Jonathan Vexler
Docker Demo has this exception when running step 2:
{code:java}
docker exec -it adhoc-2 /bin/bash
root@adhoc-2:/opt# spark-submit \
> --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
> $HUDI_UTILITIES_BUNDLE \
> --table-type COPY_ON_WRITE \
> --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
> --source-ordering-field ts \
> --target-base-path /user/hive/warehouse/stock_ticks_cow \
> --target-table stock_ticks_cow --props
> /var/demo/config/kafka-source.properties \
> --schemaprovider-class
> org.apache.hudi.utilities.schema.FilebasedSchemaProvider
22/10/31 15:14:41 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
22/10/31 15:14:41 WARN deploy.SparkSubmit$$anon$2: Failed to load
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.
java.lang.ClassNotFoundException:
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:806)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
22/10/31 15:14:41 INFO util.ShutdownHookManager: Shutdown hook called
22/10/31 15:14:41 INFO util.ShutdownHookManager: Deleting directory
/tmp/spark-c2d663e6-ff44-462a-beb0-bae5d73d3669 {code}
If you look at the size of the bundle jars, the utilities bundle is
significantly smaller than the others
{code:java}
root@adhoc-2:/opt# ls -l /var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/
total 173136
drwxr-xr-x 3 root root 96 Oct 31 15:07 antrun
-rw-r--r-- 1 root root 40597210 Oct 31 15:07 hoodie-hadoop-mr-bundle.jar
-rw-r--r-- 1 root root 36576220 Oct 31 15:07 hoodie-hive-sync-bundle.jar
-rw-r--r-- 1 root root 100091870 Oct 31 15:07 hoodie-spark-bundle.jar
-rw-r--r-- 1 root root 18336 Oct 31 15:07 hoodie-utilities.jar
drwxr-xr-x 3 root root 96 Oct 31 15:07 maven-shared-archive-resources
{code}
A quick workaround to run the demo is to replace that jar with a copy of the
complete bundle by running:
{code:java}
docker cp
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.13.0-SNAPSHOT.jar
adhoc-2:/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)