[ 
https://issues.apache.org/jira/browse/HUDI-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181233#comment-17181233
 ] 

sivabalan narayanan commented on HUDI-1204:
-------------------------------------------

I followed the steps as suggested, still running into some class not found 
issues. 

 

20/08/20 14:17:14 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable

20/08/20 14:17:19 WARN SparkContext: Using an existing SparkContext; some 
configuration may not take effect.

20/08/20 14:17:21 WARN SparkSession$Builder: Using an existing SparkSession; 
some configuration may not take effect.

20/08/20 14:17:23 WARN GenericRecordFullPayloadGenerator: The schema does not 
have any collections/complex fields. Cannot achieve minPayloadSize : 70000

20/08/20 14:17:24 WARN AvroKeyInputFormat: Reader schema was not set. Use 
AvroJob.setInputKeySchema() if desired.

20/08/20 14:17:24 WARN AvroKeyInputFormat: Reader schema was not set. Use 
AvroJob.setInputKeySchema() if desired.

20/08/20 14:17:52 WARN GenericRecordFullPayloadGenerator: The schema does not 
have any collections/complex fields. Cannot achieve minPayloadSize : 70000

20/08/20 14:17:52 WARN AvroKeyInputFormat: Reader schema was not set. Use 
AvroJob.setInputKeySchema() if desired.

20/08/20 14:17:52 WARN AvroKeyInputFormat: Reader schema was not set. Use 
AvroJob.setInputKeySchema() if desired.

20/08/20 14:17:52 WARN AvroKeyInputFormat: Reader schema was not set. Use 
AvroJob.setInputKeySchema() if desired.

20/08/20 14:18:24 WARN GenericRecordFullPayloadGenerator: The schema does not 
have any collections/complex fields. Cannot achieve minPayloadSize : 70000

20/08/20 14:18:24 WARN AvroKeyInputFormat: Reader schema was not set. Use 
AvroJob.setInputKeySchema() if desired.

20/08/20 14:18:24 WARN AvroKeyInputFormat: Reader schema was not set. Use 
AvroJob.setInputKeySchema() if desired.

20/08/20 14:18:24 WARN AvroKeyInputFormat: Reader schema was not set. Use 
AvroJob.setInputKeySchema() if desired.

20/08/20 14:18:24 WARN AvroKeyInputFormat: Reader schema was not set. Use 
AvroJob.setInputKeySchema() if desired.

20/08/20 14:18:56 WARN GenericRecordFullPayloadGenerator: The schema does not 
have any collections/complex fields. Cannot achieve minPayloadSize : 70000

20/08/20 14:18:56 WARN GenericRecordFullPayloadGenerator: The schema does not 
have any collections/complex fields. Cannot achieve minPayloadSize : 70000

20/08/20 14:18:56 WARN GenericRecordFullPayloadGenerator: The schema does not 
have any collections/complex fields. Cannot achieve minPayloadSize : 70000

20/08/20 14:18:56 WARN GenericRecordFullPayloadGenerator: The schema does not 
have any collections/complex fields. Cannot achieve minPayloadSize : 70000

20/08/20 14:18:57 WARN AvroKeyInputFormat: Reader schema was not set. Use 
AvroJob.setInputKeySchema() if desired.

20/08/20 14:18:57 WARN AvroKeyInputFormat: Reader schema was not set. Use 
AvroJob.setInputKeySchema() if desired.

20/08/20 14:18:57 WARN AvroKeyInputFormat: Reader schema was not set. Use 
AvroJob.setInputKeySchema() if desired.

20/08/20 14:18:57 WARN AvroKeyInputFormat: Reader schema was not set. Use 
AvroJob.setInputKeySchema() if desired.

20/08/20 14:18:57 WARN AvroKeyInputFormat: Reader schema was not set. Use 
AvroJob.setInputKeySchema() if desired.

20/08/20 14:18:57 WARN AvroKeyInputFormat: Reader schema was not set. Use 
AvroJob.setInputKeySchema() if desired.

20/08/20 14:18:57 WARN AvroKeyInputFormat: Reader schema was not set. Use 
AvroJob.setInputKeySchema() if desired.

20/08/20 14:18:57 WARN AvroKeyInputFormat: Reader schema was not set. Use 
AvroJob.setInputKeySchema() if desired.

20/08/20 14:20:23 WARN HiveSyncTool: Set partitionFields to empty, since the 
NonPartitionedExtractor is used

20/08/20 14:20:25 WARN HiveSyncTool: Set partitionFields to empty, since the 
NonPartitionedExtractor is used

20/08/20 14:20:26 WARN HiveSyncTool: Set partitionFields to empty, since the 
NonPartitionedExtractor is used

20/08/20 14:20:32 ERROR DagScheduler: Exception executing node

20/08/20 14:20:32 ERROR HoodieTestSuiteJob: Failed to run Test Suite 

java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: 
org.apache.hive.service.cli.HiveSQLException: Error running query: 
java.lang.NoClassDefFoundError: scala/collection/Iterable

 at java.util.concurrent.FutureTask.report(FutureTask.java:122)

 at java.util.concurrent.FutureTask.get(FutureTask.java:206)

 at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.execute(DagScheduler.java:81)

 at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.schedule(DagScheduler.java:54)

 at 
org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:140)

 at 
org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:123)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:498)

 at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)

 at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)

 at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)

 at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)

 at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)

 at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)

 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)

 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: org.apache.hudi.exception.HoodieException: 
org.apache.hive.service.cli.HiveSQLException: Error running query: 
java.lang.NoClassDefFoundError: scala/collection/Iterable

 at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:97)

 at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.lambda$execute$0(DagScheduler.java:73)

 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

 at java.util.concurrent.FutureTask.run(FutureTask.java:266)

 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

 at java.lang.Thread.run(Thread.java:748)

Caused by: org.apache.hive.service.cli.HiveSQLException: Error running query: 
java.lang.NoClassDefFoundError: scala/collection/Iterable

 at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:256)

 at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:242)

 at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)

 at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392)

 at 
org.apache.hudi.integ.testsuite.dag.nodes.HiveQueryNode.execute(HiveQueryNode.java:63)

 at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:92)

 ... 6 more

Caused by: org.apache.hive.service.cli.HiveSQLException: Error running query: 
java.lang.NoClassDefFoundError: scala/collection/Iterable

 at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:238)

 at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)

 at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)

 at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)

 at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)

 at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)

 at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)

 at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)

 at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)

 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)

 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)

 at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)

 at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)

 ... 3 more

Caused by: java.lang.NoClassDefFoundError: scala/collection/Iterable

 at 
org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.getSparkMemoryAndCores(SetSparkReducerParallelism.java:236)

 at 
org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:173)

 at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)

 at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)

 at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)

 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:56)

 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)

 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)

 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)

 at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)

 at 
org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runSetReducerParallelism(SparkCompiler.java:288)

 at 
org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:122)

 at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:140)

 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11273)

 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)

 at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)

 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)

 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)

 at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1295)

 at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204)

 ... 15 more

Caused by: java.lang.ClassNotFoundException: scala.collection.Iterable

 at java.net.URLClassLoader.findClass(URLClassLoader.java:382)

 at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)

 at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

 ... 35 more

Exception in thread "main" org.apache.hudi.exception.HoodieException: Failed to 
run Test Suite 

 at 
org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:144)

 at 
org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:123)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:498)

 at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)

 at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)

 at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)

 at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)

 at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)

 at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)

 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)

 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: 
org.apache.hive.service.cli.HiveSQLException: Error running query: 
java.lang.NoClassDefFoundError: scala/collection/Iterable

 at java.util.concurrent.FutureTask.report(FutureTask.java:122)

 at java.util.concurrent.FutureTask.get(FutureTask.java:206)

 at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.execute(DagScheduler.java:81)

 at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.schedule(DagScheduler.java:54)

 at 
org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:140)

 ... 13 more

Caused by: org.apache.hudi.exception.HoodieException: 
org.apache.hive.service.cli.HiveSQLException: Error running query: 
java.lang.NoClassDefFoundError: scala/collection/Iterable

 at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:97)

 at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.lambda$execute$0(DagScheduler.java:73)

 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

 at java.util.concurrent.FutureTask.run(FutureTask.java:266)

 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

 at java.lang.Thread.run(Thread.java:748)

Caused by: org.apache.hive.service.cli.HiveSQLException: Error running query: 
java.lang.NoClassDefFoundError: scala/collection/Iterable

 at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:256)

 at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:242)

 at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)

 at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392)

 at 
org.apache.hudi.integ.testsuite.dag.nodes.HiveQueryNode.execute(HiveQueryNode.java:63)

 at 
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:92)

 ... 6 more

Caused by: org.apache.hive.service.cli.HiveSQLException: Error running query: 
java.lang.NoClassDefFoundError: scala/collection/Iterable

 at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:238)

 at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)

 at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)

 at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)

 at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)

 at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)

 at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)

 at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)

 at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)

 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)

 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)

 at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)

 at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)

 ... 3 more

Caused by: java.lang.NoClassDefFoundError: scala/collection/Iterable

 at 
org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.getSparkMemoryAndCores(SetSparkReducerParallelism.java:236)

 at 
org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:173)

 at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)

 at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)

 at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)

 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:56)

 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)

 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)

 at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)

 at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)

 at 
org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runSetReducerParallelism(SparkCompiler.java:288)

 at 
org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:122)

 at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:140)

 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11273)

 at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)

 at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)

 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)

 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)

 at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1295)

 at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204)

 ... 15 more

Caused by: java.lang.ClassNotFoundException: scala.collection.Iterable

 at java.net.URLClassLoader.findClass(URLClassLoader.java:382)

 at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)

 at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

 ... 35 more

 

 

command that I ran

 

spark-submit --jars /opt/hudi-hive-sync-bundle-0.6.0-rc1.jar --packages 
org.apache.spark:spark-avro_2.11:2.4.0 --conf spark.task.cpus=1 --conf 
spark.executor.cores=1 --conf spark.task.maxFailures=100 --conf 
spark.memory.fraction=0.4 --conf spark.rdd.compress=true --conf 
spark.kryoserializer.buffer.max=2000m --conf 
spark.serializer=org.apache.spark.serializer.KryoSerializer --conf 
spark.memory.storageFraction=0.1 --conf spark.shuffle.service.enabled=true 
--conf spark.sql.hive.convertMetastoreParquet=false --conf spark.ui.port=5555 
--conf spark.driver.maxResultSize=12g --conf 
spark.executor.heartbeatInterval=120s --conf spark.network.timeout=600s --conf 
spark.eventLog.overwrite=true --conf spark.eventLog.enabled=true --conf 
spark.yarn.max.executor.failures=10 --conf spark.sql.catalogImplementation=hive 
--conf spark.sql.shuffle.partitions=1000 --conf 
spark.driver.extraClassPath=hive-common-2.3.1.jar:hive-exec-2.3.1-core.jar:hive-jdbc-2.3.1.jar:hive-llap-common-2.3.1.jar:hive-metastore-2.3.1.jar:hive-serde-2.3.1.jar:hive-service-2.3.1.jar:hive-service-rpc-2.3.1.jar:hive-shims-0.23-2.3.1.jar:hive-shims-common-2.3.1.jar:hive-storage-api-2.3.1.jar:hive-shims-2.3.1.jar:spark-hive-thriftserver_2.12-3.0.0-preview2.jar:json-20090211.jar
 --conf 
spark.executor.extraClassPath=hive-common-2.3.1.jar:hive-exec-2.3.1-core.jar:hive-jdbc-2.3.1.jar:hive-llap-common-2.3.1.jar:hive-metastore-2.3.1.jar:hive-serde-2.3.1.jar:hive-service-2.3.1.jar:hive-service-rpc-2.3.1.jar:hive-shims-0.23-2.3.1.jar:hive-shims-common-2.3.1.jar:hive-storage-api-2.3.1.jar:hive-shims-2.3.1.jar:spark-hive-thriftserver_2.12-3.0.0-preview2.jar:json-20090211.jar
 --class org.apache.hudi.integ.testsuite.HoodieTestSuiteJob 
/opt/hudi-integ-test-bundle-0.6.0-rc1.jar --source-ordering-field timestamp 
--target-base-path /user/hive/warehouse/hudi-integ-test-suite/output 
--input-base-path /user/hive/warehouse/hudi-integ-test-suite/input 
--target-table table1 --props 
/var/hoodie/ws/docker/demo/config/test-suite/test-source.properties 
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider 
--source-limit 300000000 --source-class 
org.apache.hudi.utilities.sources.AvroDFSSource --input-file-size 125829120 
--workload-yaml-path 
/var/hoodie/ws/docker/demo/config/test-suite/complex-dag-cow.yaml 
--workload-generator-classname 
org.apache.hudi.integ.testsuite.dag.WorkflowDagGenerator --table-type 
COPY_ON_WRITE --compact-scheduling-minshare 1 --hoodie-conf 
"hoodie.deltastreamer.source.test.num_partitions=100" --hoodie-conf 
"hoodie.deltastreamer.source.test.datagen.use_rocksdb_for_storing_existing_keys=false"
 --hoodie-conf "hoodie.deltastreamer.source.test.max_unique_records=100000000" 
--hoodie-conf "hoodie.embed.timeline.server=false" --hoodie-conf 
"hoodie.datasource.write.recordkey.field=_row_key" --hoodie-conf 
"hoodie.deltastreamer.source.dfs.root=/user/hive/warehouse/hudi-integ-test-suite/input"
 --hoodie-conf 
"hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator"
 --hoodie-conf "hoodie.datasource.write.partitionpath.field=timestamp" 
--hoodie-conf 
"hoodie.deltastreamer.schemaprovider.source.schema.file=/var/hoodie/ws/docker/demo/config/test-suite/source.avsc"
 --hoodie-conf "hoodie.datasource.hive_sync.assume_date_partitioning=false" 
--hoodie-conf 
"hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://hiveserver:10000/" 
--hoodie-conf "hoodie.datasource.hive_sync.database=testdb" --hoodie-conf 
"hoodie.datasource.hive_sync.table=table1" --hoodie-conf 
"hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor"
 --hoodie-conf "hoodie.datasource.hive_sync.assume_date_partitioning=true" 
--hoodie-conf 
"hoodie.datasource.write.keytranslator.class=org.apache.hudi.DayBasedPartitionPathKeyTranslator"
 --hoodie-conf 
"hoodie.deltastreamer.schemaprovider.target.schema.file=/var/hoodie/ws/docker/demo/config/test-suite/source.avsc"

 

> NoClassDefFoundError with AbstractSyncTool while running HoodieTestSuiteJob
> ---------------------------------------------------------------------------
>
>                 Key: HUDI-1204
>                 URL: https://issues.apache.org/jira/browse/HUDI-1204
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Testing
>    Affects Versions: 0.6.1
>            Reporter: sivabalan narayanan
>            Assignee: Nishith Agarwal
>            Priority: Major
>
> I was trying to run HoodieTestSuiteJob in my local docker set up and ran into 
> dep issue.
>  
> spark-submit --master local --class 
> org.apache.hudi.integ.testsuite.HoodieTestSuiteJob --packages 
> com.databricks:spark-avro_2.11:4.0.0 
> /opt/hudi-integ-test-bundle-0.6.0-rc1.jar  --source-ordering-field timestamp  
>   --target-base-path /user/hive/warehouse/hudi-test-suite/output    
> --input-base-path /user/hive/warehouse/hudi-test-suite/input    
> --target-table test_table    --props [file:///opt/test-source.properties]    
> --schemaprovider-class 
> org.apache.hudi.utilities.schema.FilebasedSchemaProvider    --source-class 
> org.apache.hudi.utilities.sources.AvroDFSSource    --input-file-size 12582912 
>  --workload-yaml-path 
> /var/hoodie/ws/docker/demo/config/test-suite/complex-dag-cow.yaml 
> --table-type COPY_ON_WRITE    --workload-generator-classname yaml
>  
> {code:java}
> 20/08/19 21:42:26 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/hudi/sync/common/AbstractSyncTool
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
> at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
> at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$Config.<init>(HoodieDeltaStreamer.java:279)
> at 
> org.apache.hudi.integ.testsuite.HoodieTestSuiteJob$HoodieTestSuiteConfig.<init>(HoodieTestSuiteJob.java:153)
> at 
> org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:114)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hudi.sync.common.AbstractSyncTool
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 26 more
>  {code}
> I tried adding hudi-sync-common as dep to hudi-utilities, but didn't fix the 
> issue. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to