data-storyteller opened a new issue #4621:
URL: https://github.com/apache/hudi/issues/4621


   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
[email protected].
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : latest (master)
   
   * Spark version : 2.4.7
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   * Yes
   
   
   **Additional context**
   Running the integ test on docker setup. The tests are failing with following 
stacktrace.
   Command - 
   `docker exec -i adhoc-2 /bin/bash spark-submit --packages 
org.apache.spark:spark-avro_2.11:2.4.0 --conf spark.task.cpus=1 --conf 
spark.executor.cores=1 --conf spark.task.maxFailures=100 --conf 
spark.memory.fraction=0.4  --conf spark.rdd.compress=true  --conf 
spark.kryoserializer.buffer.max=2000m --conf 
spark.serializer=org.apache.spark.serializer.KryoSerializer --conf 
spark.memory.storageFraction=0.1 --conf spark.shuffle.service.enabled=true  
--conf spark.sql.hive.convertMetastoreParquet=false  --conf 
spark.driver.maxResultSize=12g --conf spark.executor.heartbeatInterval=120s 
--conf spark.network.timeout=600s --conf spark.yarn.max.executor.failures=10 
--conf spark.sql.catalogImplementation=hive --conf 
spark.driver.extraClassPath=/var/demo/jars/* --conf 
spark.executor.extraClassPath=/var/demo/jars/* --class 
org.apache.hudi.integ.testsuite.HoodieTestSuiteJob  /opt/$HUDI_JAR_NAME 
--source-ordering-field test_suite_source_ordering_field --target-base-path 
/user/hive/warehouse/hudi-int
 eg-test-suite/output --input-base-path 
/user/hive/warehouse/hudi-integ-test-suite/input --target-table table1 --props 
file:/var/hoodie/ws/docker/demo/config/test-suite/$PROP_FILE 
--schemaprovider-class 
org.apache.hudi.integ.testsuite.schema.TestSuiteFileBasedSchemaProvider 
--source-class org.apache.hudi.utilities.sources.AvroDFSSource 
--input-file-size 125829120 --workload-yaml-path 
file:/var/hoodie/ws/docker/demo/config/test-suite/$YAML_NAME 
--workload-generator-classname 
org.apache.hudi.integ.testsuite.dag.WorkflowDagGenerator --table-type 
$TABLE_TYPE --compact-scheduling-minshare 1 $EXTRA_SPARK_ARGS --clean-input 
--clean-output`
   
   
   **Stacktrace**
   
   ```
   
   22/01/17 06:36:13 INFO DagNode: Configs : 
{"name":"a89cea37-7224-4f36-8c00-90306ddf6172","record_size":1000,"repeat_count":1,"num_partitions_insert":1,"num_records_insert":300,"config":"third_insert"}
   --
   17732 | 22/01/17 06:36:13 INFO DagNode: Inserting input data 
a89cea37-7224-4f36-8c00-90306ddf6172
   17733 | 22/01/17 06:36:13 INFO HoodieTestSuiteJob: Using 
DFSTestSuitePathSelector, checkpoint: Option{val=2} sourceLimit: 
9223372036854775807 lastBatchId: 2 nextBatchId: 3
   17734 | 00:09  WARN: Timeline-server-based markers are configured as the 
marker type but embedded timeline server is not enabled.  Falling back to 
direct markers.
   17735 | 00:10  WARN: Timeline-server-based markers are configured as the 
marker type but embedded timeline server is not enabled.  Falling back to 
direct markers.
   17736 | 00:12  WARN: Timeline-server-based markers are configured as the 
marker type but embedded timeline server is not enabled.  Falling back to 
direct markers.
   17737 | 22/01/17 06:36:16 INFO DagScheduler: Finished executing 
a89cea37-7224-4f36-8c00-90306ddf6172
   17738 | 22/01/17 06:36:16 WARN DagScheduler: Executing node 
"first_hive_sync" :: 
{"queue_name":"adhoc","engine":"mr","name":"994a5035-0362-4c9a-a7d7-e47397f2b113","config":"first_hive_sync"}
   17739 | 22/01/17 06:36:16 INFO DagNode: Executing hive sync node
   17740 | 22/01/17 06:36:19 INFO DagScheduler: Finished executing 
994a5035-0362-4c9a-a7d7-e47397f2b113
   17741 | 22/01/17 06:36:19 WARN DagScheduler: Executing node "first_validate" 
:: 
{"name":"3f562e32-b7d8-4d96-a977-44b6b876c333","validate_hive":false,"config":"first_validate"}
   17742 | 22/01/17 06:36:19 WARN DagNode: Validation using data from input 
path /user/hive/warehouse/hudi-integ-test-suite/input/*/*
   17743 | 22/01/17 06:36:21 INFO ValidateDatasetNode: Validate data in target 
hudi path /user/hive/warehouse/hudi-integ-test-suite/output/*/*/*
   17744 | 22/01/17 06:36:21 ERROR DagScheduler: Exception executing node
   17745 | java.lang.ClassNotFoundException: Failed to find data source: hudi. 
Please find packages at http://spark.apache.org/third-party-projects.html
   17746 | at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
   17747 | at 
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
   17748 | at 
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
   17749 | at 
org.apache.hudi.integ.testsuite.dag.nodes.ValidateDatasetNode.getDatasetToValidate(ValidateDatasetNode.java:52)
   17750 | at 
org.apache.hudi.integ.testsuite.dag.nodes.BaseValidateDatasetNode.execute(BaseValidateDatasetNode.java:99)
   17751 | at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:139)
   17752 | at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.lambda$execute$0(DagScheduler.java:105)
   17753 | at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   17754 | at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   17755 | at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   17756 | at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   17757 | at java.lang.Thread.run(Thread.java:748)
   17758 | Caused by: java.lang.ClassNotFoundException: hudi.DefaultSource
   17759 | at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
   17760 | at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
   17761 | at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   17762 | at 
org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634)
   17763 | at 
org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634)
   17764 | at scala.util.Try$.apply(Try.scala:192)
   17765 | at 
org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634)
   17766 | at 
org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634)
   17767 | at scala.util.Try.orElse(Try.scala:84)
   17768 | at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)
   17769 | ... 11 more
   17770 | 22/01/17 06:36:21 INFO DagScheduler: Forcing shutdown of executor 
service, this might kill running tasks
   17771 | 22/01/17 06:36:21 ERROR HoodieTestSuiteJob: Failed to run Test Suite
   17772 | java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: java.lang.ClassNotFoundException: 
Failed to find data source: hudi. Please find packages at 
http://spark.apache.org/third-party-projects.html
   17773 | at java.util.concurrent.FutureTask.report(FutureTask.java:122)
   17774 | at java.util.concurrent.FutureTask.get(FutureTask.java:206)
   17775 | at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.execute(DagScheduler.java:113)
   17776 | at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.schedule(DagScheduler.java:68)
   17777 | at 
org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:203)
   17778 | at 
org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:170)
   17779 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   17780 | at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   17781 | at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   17782 | at java.lang.reflect.Method.invoke(Method.java:498)
   17783 | at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
   17784 | at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:845)
   17785 | at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
   17786 | at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
   17787 | at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
   17788 | at 
org.apache.spark.deploy.SparkSubmit$anon$2.doSubmit(SparkSubmit.scala:920)
   17789 | at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
   17790 | at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   17791 | Caused by: org.apache.hudi.exception.HoodieException: 
java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find 
packages at http://spark.apache.org/third-party-projects.html
   17792 | at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:146)
   17793 | at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.lambda$execute$0(DagScheduler.java:105)
   17794 | at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   17795 | at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   17796 | at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   17797 | at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   17798 | at java.lang.Thread.run(Thread.java:748)
   17799 | Caused by: java.lang.ClassNotFoundException: Failed to find data 
source: hudi. Please find packages at 
http://spark.apache.org/third-party-projects.html
   17800 | at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
   17801 | at 
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
   17802 | at 
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
   17803 | at 
org.apache.hudi.integ.testsuite.dag.nodes.ValidateDatasetNode.getDatasetToValidate(ValidateDatasetNode.java:52)
   17804 | at 
org.apache.hudi.integ.testsuite.dag.nodes.BaseValidateDatasetNode.execute(BaseValidateDatasetNode.java:99)
   17805 | at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:139)
   17806 | ... 6 more
   17807 | Caused by: java.lang.ClassNotFoundException: hudi.DefaultSource
   17808 | at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
   17809 | at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
   17810 | at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   17811 | at 
org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634)
   17812 | at 
org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634)
   17813 | at scala.util.Try$.apply(Try.scala:192)
   17814 | at 
org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634)
   17815 | at 
org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634)
   17816 | at scala.util.Try.orElse(Try.scala:84)
   17817 | at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)
   17818 | ... 11 more
   17819 | Exception in thread "main" 
org.apache.hudi.exception.HoodieException: Failed to run Test Suite
   17820 | at 
org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:208)
   17821 | at 
org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:170)
   17822 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   17823 | at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   17824 | at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   17825 | at java.lang.reflect.Method.invoke(Method.java:498)
   17826 | at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
   17827 | at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:845)
   17828 | at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
   17829 | at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
   17830 | at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
   17831 | at 
org.apache.spark.deploy.SparkSubmit$anon$2.doSubmit(SparkSubmit.scala:920)
   17832 | at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
   17833 | at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   17834 | Caused by: java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: java.lang.ClassNotFoundException: 
Failed to find data source: hudi. Please find packages at 
http://spark.apache.org/third-party-projects.html
   17835 | at java.util.concurrent.FutureTask.report(FutureTask.java:122)
   17836 | at java.util.concurrent.FutureTask.get(FutureTask.java:206)
   17837 | at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.execute(DagScheduler.java:113)
   17838 | at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.schedule(DagScheduler.java:68)
   17839 | at 
org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:203)
   17840 | ... 13 more
   17841 | Caused by: org.apache.hudi.exception.HoodieException: 
java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find 
packages at http://spark.apache.org/third-party-projects.html
   17842 | at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:146)
   17843 | at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.lambda$execute$0(DagScheduler.java:105)
   17844 | at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   17845 | at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   17846 | at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   17847 | at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   17848 | at java.lang.Thread.run(Thread.java:748)
   17849 | Caused by: java.lang.ClassNotFoundException: Failed to find data 
source: hudi. Please find packages at 
http://spark.apache.org/third-party-projects.html
   17850 | at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
   17851 | at 
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
   17852 | at 
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
   17853 | at 
org.apache.hudi.integ.testsuite.dag.nodes.ValidateDatasetNode.getDatasetToValidate(ValidateDatasetNode.java:52)
   17854 | at 
org.apache.hudi.integ.testsuite.dag.nodes.BaseValidateDatasetNode.execute(BaseValidateDatasetNode.java:99)
   17855 | at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:139)
   17856 | ... 6 more
   17857 | Caused by: java.lang.ClassNotFoundException: hudi.DefaultSource
   17858 | at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
   17859 | at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
   17860 | at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   17861 | at 
org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634)
   17862 | at 
org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634)
   17863 | at scala.util.Try$.apply(Try.scala:192)
   17864 | at 
org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634)
   17865 | at 
org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634)
   17866 | at scala.util.Try.orElse(Try.scala:84)
   17867 | at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)
   17868 | ... 11 more
   17869 |  
   17870 | [Container] 2022/01/17 06:36:22 Command did not exit successfully sh 
run-intig-test.sh 2022-01-17 MERGE_ON_READ cow-long-running-example.yaml exit 
status 1
   
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to