Jonathan Vexler created HUDI-5110:
-------------------------------------

             Summary: Utilities Bundle in Docker Demo is incomplete
                 Key: HUDI-5110
                 URL: https://issues.apache.org/jira/browse/HUDI-5110
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Jonathan Vexler


Docker Demo has this exception when running step 2: 
{code:java}
docker exec -it adhoc-2 /bin/bash
root@adhoc-2:/opt# spark-submit \
>   --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
> $HUDI_UTILITIES_BUNDLE \
>   --table-type COPY_ON_WRITE \
>   --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
>   --source-ordering-field ts  \
>   --target-base-path /user/hive/warehouse/stock_ticks_cow \
>   --target-table stock_ticks_cow --props 
> /var/demo/config/kafka-source.properties \
>   --schemaprovider-class 
> org.apache.hudi.utilities.schema.FilebasedSchemaProvider
22/10/31 15:14:41 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
22/10/31 15:14:41 WARN deploy.SparkSubmit$$anon$2: Failed to load 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.
java.lang.ClassNotFoundException: 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:806)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
22/10/31 15:14:41 INFO util.ShutdownHookManager: Shutdown hook called
22/10/31 15:14:41 INFO util.ShutdownHookManager: Deleting directory 
/tmp/spark-c2d663e6-ff44-462a-beb0-bae5d73d3669 {code}
If you look at the size of the bundle jars, the utilities bundle is 
significantly smaller than the others
{code:java}
root@adhoc-2:/opt# ls -l /var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/
total 173136
drwxr-xr-x 3 root root        96 Oct 31 15:07 antrun
-rw-r--r-- 1 root root  40597210 Oct 31 15:07 hoodie-hadoop-mr-bundle.jar
-rw-r--r-- 1 root root  36576220 Oct 31 15:07 hoodie-hive-sync-bundle.jar
-rw-r--r-- 1 root root 100091870 Oct 31 15:07 hoodie-spark-bundle.jar
-rw-r--r-- 1 root root     18336 Oct 31 15:07 hoodie-utilities.jar
drwxr-xr-x 3 root root        96 Oct 31 15:07 maven-shared-archive-resources 
{code}
A quick workaround to run the demo is to replace that jar with a copy of the 
complete bundle by running:
{code:java}
docker cp 
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.13.0-SNAPSHOT.jar
 
adhoc-2:/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to