data-storyteller opened a new issue #4621: URL: https://github.com/apache/hudi/issues/4621
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at [email protected]. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** A clear and concise description of the problem. **To Reproduce** Steps to reproduce the behavior: 1. 2. 3. 4. **Expected behavior** A clear and concise description of what you expected to happen. **Environment Description** * Hudi version : latest (master) * Spark version : 2.4.7 * Hive version : * Hadoop version : * Storage (HDFS/S3/GCS..) : * Running on Docker? (yes/no) : * Yes **Additional context** Running the integ test on docker setup. The tests are failing with following stacktrace. Command - `docker exec -i adhoc-2 /bin/bash spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.0 --conf spark.task.cpus=1 --conf spark.executor.cores=1 --conf spark.task.maxFailures=100 --conf spark.memory.fraction=0.4 --conf spark.rdd.compress=true --conf spark.kryoserializer.buffer.max=2000m --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.memory.storageFraction=0.1 --conf spark.shuffle.service.enabled=true --conf spark.sql.hive.convertMetastoreParquet=false --conf spark.driver.maxResultSize=12g --conf spark.executor.heartbeatInterval=120s --conf spark.network.timeout=600s --conf spark.yarn.max.executor.failures=10 --conf spark.sql.catalogImplementation=hive --conf spark.driver.extraClassPath=/var/demo/jars/* --conf spark.executor.extraClassPath=/var/demo/jars/* --class org.apache.hudi.integ.testsuite.HoodieTestSuiteJob /opt/$HUDI_JAR_NAME --source-ordering-field test_suite_source_ordering_field --target-base-path /user/hive/warehouse/hudi-int eg-test-suite/output --input-base-path /user/hive/warehouse/hudi-integ-test-suite/input --target-table table1 --props file:/var/hoodie/ws/docker/demo/config/test-suite/$PROP_FILE --schemaprovider-class org.apache.hudi.integ.testsuite.schema.TestSuiteFileBasedSchemaProvider --source-class org.apache.hudi.utilities.sources.AvroDFSSource --input-file-size 125829120 --workload-yaml-path file:/var/hoodie/ws/docker/demo/config/test-suite/$YAML_NAME --workload-generator-classname org.apache.hudi.integ.testsuite.dag.WorkflowDagGenerator --table-type $TABLE_TYPE --compact-scheduling-minshare 1 $EXTRA_SPARK_ARGS --clean-input --clean-output` **Stacktrace** ``` 22/01/17 06:36:13 INFO DagNode: Configs : {"name":"a89cea37-7224-4f36-8c00-90306ddf6172","record_size":1000,"repeat_count":1,"num_partitions_insert":1,"num_records_insert":300,"config":"third_insert"} -- 17732 | 22/01/17 06:36:13 INFO DagNode: Inserting input data a89cea37-7224-4f36-8c00-90306ddf6172 17733 | 22/01/17 06:36:13 INFO HoodieTestSuiteJob: Using DFSTestSuitePathSelector, checkpoint: Option{val=2} sourceLimit: 9223372036854775807 lastBatchId: 2 nextBatchId: 3 17734 | 00:09 WARN: Timeline-server-based markers are configured as the marker type but embedded timeline server is not enabled. Falling back to direct markers. 17735 | 00:10 WARN: Timeline-server-based markers are configured as the marker type but embedded timeline server is not enabled. Falling back to direct markers. 17736 | 00:12 WARN: Timeline-server-based markers are configured as the marker type but embedded timeline server is not enabled. Falling back to direct markers. 17737 | 22/01/17 06:36:16 INFO DagScheduler: Finished executing a89cea37-7224-4f36-8c00-90306ddf6172 17738 | 22/01/17 06:36:16 WARN DagScheduler: Executing node "first_hive_sync" :: {"queue_name":"adhoc","engine":"mr","name":"994a5035-0362-4c9a-a7d7-e47397f2b113","config":"first_hive_sync"} 17739 | 22/01/17 06:36:16 INFO DagNode: Executing hive sync node 17740 | 22/01/17 06:36:19 INFO DagScheduler: Finished executing 994a5035-0362-4c9a-a7d7-e47397f2b113 17741 | 22/01/17 06:36:19 WARN DagScheduler: Executing node "first_validate" :: {"name":"3f562e32-b7d8-4d96-a977-44b6b876c333","validate_hive":false,"config":"first_validate"} 17742 | 22/01/17 06:36:19 WARN DagNode: Validation using data from input path /user/hive/warehouse/hudi-integ-test-suite/input/*/* 17743 | 22/01/17 06:36:21 INFO ValidateDatasetNode: Validate data in target hudi path /user/hive/warehouse/hudi-integ-test-suite/output/*/*/* 17744 | 22/01/17 06:36:21 ERROR DagScheduler: Exception executing node 17745 | java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html 17746 | at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657) 17747 | at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194) 17748 | at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) 17749 | at org.apache.hudi.integ.testsuite.dag.nodes.ValidateDatasetNode.getDatasetToValidate(ValidateDatasetNode.java:52) 17750 | at org.apache.hudi.integ.testsuite.dag.nodes.BaseValidateDatasetNode.execute(BaseValidateDatasetNode.java:99) 17751 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:139) 17752 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.lambda$execute$0(DagScheduler.java:105) 17753 | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 17754 | at java.util.concurrent.FutureTask.run(FutureTask.java:266) 17755 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 17756 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 17757 | at java.lang.Thread.run(Thread.java:748) 17758 | Caused by: java.lang.ClassNotFoundException: hudi.DefaultSource 17759 | at java.net.URLClassLoader.findClass(URLClassLoader.java:382) 17760 | at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 17761 | at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 17762 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634) 17763 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634) 17764 | at scala.util.Try$.apply(Try.scala:192) 17765 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634) 17766 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634) 17767 | at scala.util.Try.orElse(Try.scala:84) 17768 | at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634) 17769 | ... 11 more 17770 | 22/01/17 06:36:21 INFO DagScheduler: Forcing shutdown of executor service, this might kill running tasks 17771 | 22/01/17 06:36:21 ERROR HoodieTestSuiteJob: Failed to run Test Suite 17772 | java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html 17773 | at java.util.concurrent.FutureTask.report(FutureTask.java:122) 17774 | at java.util.concurrent.FutureTask.get(FutureTask.java:206) 17775 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.execute(DagScheduler.java:113) 17776 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.schedule(DagScheduler.java:68) 17777 | at org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:203) 17778 | at org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:170) 17779 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 17780 | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 17781 | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 17782 | at java.lang.reflect.Method.invoke(Method.java:498) 17783 | at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 17784 | at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:845) 17785 | at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) 17786 | at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) 17787 | at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) 17788 | at org.apache.spark.deploy.SparkSubmit$anon$2.doSubmit(SparkSubmit.scala:920) 17789 | at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) 17790 | at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17791 | Caused by: org.apache.hudi.exception.HoodieException: java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html 17792 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:146) 17793 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.lambda$execute$0(DagScheduler.java:105) 17794 | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 17795 | at java.util.concurrent.FutureTask.run(FutureTask.java:266) 17796 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 17797 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 17798 | at java.lang.Thread.run(Thread.java:748) 17799 | Caused by: java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html 17800 | at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657) 17801 | at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194) 17802 | at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) 17803 | at org.apache.hudi.integ.testsuite.dag.nodes.ValidateDatasetNode.getDatasetToValidate(ValidateDatasetNode.java:52) 17804 | at org.apache.hudi.integ.testsuite.dag.nodes.BaseValidateDatasetNode.execute(BaseValidateDatasetNode.java:99) 17805 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:139) 17806 | ... 6 more 17807 | Caused by: java.lang.ClassNotFoundException: hudi.DefaultSource 17808 | at java.net.URLClassLoader.findClass(URLClassLoader.java:382) 17809 | at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 17810 | at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 17811 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634) 17812 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634) 17813 | at scala.util.Try$.apply(Try.scala:192) 17814 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634) 17815 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634) 17816 | at scala.util.Try.orElse(Try.scala:84) 17817 | at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634) 17818 | ... 11 more 17819 | Exception in thread "main" org.apache.hudi.exception.HoodieException: Failed to run Test Suite 17820 | at org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:208) 17821 | at org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:170) 17822 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 17823 | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 17824 | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 17825 | at java.lang.reflect.Method.invoke(Method.java:498) 17826 | at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 17827 | at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:845) 17828 | at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) 17829 | at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) 17830 | at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) 17831 | at org.apache.spark.deploy.SparkSubmit$anon$2.doSubmit(SparkSubmit.scala:920) 17832 | at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) 17833 | at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17834 | Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html 17835 | at java.util.concurrent.FutureTask.report(FutureTask.java:122) 17836 | at java.util.concurrent.FutureTask.get(FutureTask.java:206) 17837 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.execute(DagScheduler.java:113) 17838 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.schedule(DagScheduler.java:68) 17839 | at org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:203) 17840 | ... 13 more 17841 | Caused by: org.apache.hudi.exception.HoodieException: java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html 17842 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:146) 17843 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.lambda$execute$0(DagScheduler.java:105) 17844 | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 17845 | at java.util.concurrent.FutureTask.run(FutureTask.java:266) 17846 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 17847 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 17848 | at java.lang.Thread.run(Thread.java:748) 17849 | Caused by: java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html 17850 | at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657) 17851 | at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194) 17852 | at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) 17853 | at org.apache.hudi.integ.testsuite.dag.nodes.ValidateDatasetNode.getDatasetToValidate(ValidateDatasetNode.java:52) 17854 | at org.apache.hudi.integ.testsuite.dag.nodes.BaseValidateDatasetNode.execute(BaseValidateDatasetNode.java:99) 17855 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:139) 17856 | ... 6 more 17857 | Caused by: java.lang.ClassNotFoundException: hudi.DefaultSource 17858 | at java.net.URLClassLoader.findClass(URLClassLoader.java:382) 17859 | at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 17860 | at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 17861 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634) 17862 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634) 17863 | at scala.util.Try$.apply(Try.scala:192) 17864 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634) 17865 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634) 17866 | at scala.util.Try.orElse(Try.scala:84) 17867 | at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634) 17868 | ... 11 more 17869 | 17870 | [Container] 2022/01/17 06:36:22 Command did not exit successfully sh run-intig-test.sh 2022-01-17 MERGE_ON_READ cow-long-running-example.yaml exit status 1 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
