[
https://issues.apache.org/jira/browse/HUDI-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181233#comment-17181233
]
sivabalan narayanan commented on HUDI-1204:
-------------------------------------------
I followed the steps as suggested, still running into some class not found
issues.
20/08/20 14:17:14 WARN NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
20/08/20 14:17:19 WARN SparkContext: Using an existing SparkContext; some
configuration may not take effect.
20/08/20 14:17:21 WARN SparkSession$Builder: Using an existing SparkSession;
some configuration may not take effect.
20/08/20 14:17:23 WARN GenericRecordFullPayloadGenerator: The schema does not
have any collections/complex fields. Cannot achieve minPayloadSize : 70000
20/08/20 14:17:24 WARN AvroKeyInputFormat: Reader schema was not set. Use
AvroJob.setInputKeySchema() if desired.
20/08/20 14:17:24 WARN AvroKeyInputFormat: Reader schema was not set. Use
AvroJob.setInputKeySchema() if desired.
20/08/20 14:17:52 WARN GenericRecordFullPayloadGenerator: The schema does not
have any collections/complex fields. Cannot achieve minPayloadSize : 70000
20/08/20 14:17:52 WARN AvroKeyInputFormat: Reader schema was not set. Use
AvroJob.setInputKeySchema() if desired.
20/08/20 14:17:52 WARN AvroKeyInputFormat: Reader schema was not set. Use
AvroJob.setInputKeySchema() if desired.
20/08/20 14:17:52 WARN AvroKeyInputFormat: Reader schema was not set. Use
AvroJob.setInputKeySchema() if desired.
20/08/20 14:18:24 WARN GenericRecordFullPayloadGenerator: The schema does not
have any collections/complex fields. Cannot achieve minPayloadSize : 70000
20/08/20 14:18:24 WARN AvroKeyInputFormat: Reader schema was not set. Use
AvroJob.setInputKeySchema() if desired.
20/08/20 14:18:24 WARN AvroKeyInputFormat: Reader schema was not set. Use
AvroJob.setInputKeySchema() if desired.
20/08/20 14:18:24 WARN AvroKeyInputFormat: Reader schema was not set. Use
AvroJob.setInputKeySchema() if desired.
20/08/20 14:18:24 WARN AvroKeyInputFormat: Reader schema was not set. Use
AvroJob.setInputKeySchema() if desired.
20/08/20 14:18:56 WARN GenericRecordFullPayloadGenerator: The schema does not
have any collections/complex fields. Cannot achieve minPayloadSize : 70000
20/08/20 14:18:56 WARN GenericRecordFullPayloadGenerator: The schema does not
have any collections/complex fields. Cannot achieve minPayloadSize : 70000
20/08/20 14:18:56 WARN GenericRecordFullPayloadGenerator: The schema does not
have any collections/complex fields. Cannot achieve minPayloadSize : 70000
20/08/20 14:18:56 WARN GenericRecordFullPayloadGenerator: The schema does not
have any collections/complex fields. Cannot achieve minPayloadSize : 70000
20/08/20 14:18:57 WARN AvroKeyInputFormat: Reader schema was not set. Use
AvroJob.setInputKeySchema() if desired.
20/08/20 14:18:57 WARN AvroKeyInputFormat: Reader schema was not set. Use
AvroJob.setInputKeySchema() if desired.
20/08/20 14:18:57 WARN AvroKeyInputFormat: Reader schema was not set. Use
AvroJob.setInputKeySchema() if desired.
20/08/20 14:18:57 WARN AvroKeyInputFormat: Reader schema was not set. Use
AvroJob.setInputKeySchema() if desired.
20/08/20 14:18:57 WARN AvroKeyInputFormat: Reader schema was not set. Use
AvroJob.setInputKeySchema() if desired.
20/08/20 14:18:57 WARN AvroKeyInputFormat: Reader schema was not set. Use
AvroJob.setInputKeySchema() if desired.
20/08/20 14:18:57 WARN AvroKeyInputFormat: Reader schema was not set. Use
AvroJob.setInputKeySchema() if desired.
20/08/20 14:18:57 WARN AvroKeyInputFormat: Reader schema was not set. Use
AvroJob.setInputKeySchema() if desired.
20/08/20 14:20:23 WARN HiveSyncTool: Set partitionFields to empty, since the
NonPartitionedExtractor is used
20/08/20 14:20:25 WARN HiveSyncTool: Set partitionFields to empty, since the
NonPartitionedExtractor is used
20/08/20 14:20:26 WARN HiveSyncTool: Set partitionFields to empty, since the
NonPartitionedExtractor is used
20/08/20 14:20:32 ERROR DagScheduler: Exception executing node
20/08/20 14:20:32 ERROR HoodieTestSuiteJob: Failed to run Test Suite
java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieException:
org.apache.hive.service.cli.HiveSQLException: Error running query:
java.lang.NoClassDefFoundError: scala/collection/Iterable
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:206)
at
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.execute(DagScheduler.java:81)
at
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.schedule(DagScheduler.java:54)
at
org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:140)
at
org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:123)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.hudi.exception.HoodieException:
org.apache.hive.service.cli.HiveSQLException: Error running query:
java.lang.NoClassDefFoundError: scala/collection/Iterable
at
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:97)
at
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.lambda$execute$0(DagScheduler.java:73)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error running query:
java.lang.NoClassDefFoundError: scala/collection/Iterable
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:256)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:242)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392)
at
org.apache.hudi.integ.testsuite.dag.nodes.HiveQueryNode.execute(HiveQueryNode.java:63)
at
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:92)
... 6 more
Caused by: org.apache.hive.service.cli.HiveSQLException: Error running query:
java.lang.NoClassDefFoundError: scala/collection/Iterable
at
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:238)
at
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
at
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
at
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
at
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
at
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
... 3 more
Caused by: java.lang.NoClassDefFoundError: scala/collection/Iterable
at
org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.getSparkMemoryAndCores(SetSparkReducerParallelism.java:236)
at
org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:173)
at
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:56)
at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
at
org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runSetReducerParallelism(SparkCompiler.java:288)
at
org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:122)
at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:140)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11273)
at
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1295)
at
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204)
... 15 more
Caused by: java.lang.ClassNotFoundException: scala.collection.Iterable
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 35 more
Exception in thread "main" org.apache.hudi.exception.HoodieException: Failed to
run Test Suite
at
org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:144)
at
org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:123)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieException:
org.apache.hive.service.cli.HiveSQLException: Error running query:
java.lang.NoClassDefFoundError: scala/collection/Iterable
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:206)
at
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.execute(DagScheduler.java:81)
at
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.schedule(DagScheduler.java:54)
at
org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:140)
... 13 more
Caused by: org.apache.hudi.exception.HoodieException:
org.apache.hive.service.cli.HiveSQLException: Error running query:
java.lang.NoClassDefFoundError: scala/collection/Iterable
at
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:97)
at
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.lambda$execute$0(DagScheduler.java:73)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error running query:
java.lang.NoClassDefFoundError: scala/collection/Iterable
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:256)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:242)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392)
at
org.apache.hudi.integ.testsuite.dag.nodes.HiveQueryNode.execute(HiveQueryNode.java:63)
at
org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:92)
... 6 more
Caused by: org.apache.hive.service.cli.HiveSQLException: Error running query:
java.lang.NoClassDefFoundError: scala/collection/Iterable
at
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:238)
at
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
at
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
at
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
at
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
at
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
... 3 more
Caused by: java.lang.NoClassDefFoundError: scala/collection/Iterable
at
org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.getSparkMemoryAndCores(SetSparkReducerParallelism.java:236)
at
org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:173)
at
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:56)
at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
at
org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runSetReducerParallelism(SparkCompiler.java:288)
at
org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:122)
at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:140)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11273)
at
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1295)
at
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204)
... 15 more
Caused by: java.lang.ClassNotFoundException: scala.collection.Iterable
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 35 more
command that I ran
spark-submit --jars /opt/hudi-hive-sync-bundle-0.6.0-rc1.jar --packages
org.apache.spark:spark-avro_2.11:2.4.0 --conf spark.task.cpus=1 --conf
spark.executor.cores=1 --conf spark.task.maxFailures=100 --conf
spark.memory.fraction=0.4 --conf spark.rdd.compress=true --conf
spark.kryoserializer.buffer.max=2000m --conf
spark.serializer=org.apache.spark.serializer.KryoSerializer --conf
spark.memory.storageFraction=0.1 --conf spark.shuffle.service.enabled=true
--conf spark.sql.hive.convertMetastoreParquet=false --conf spark.ui.port=5555
--conf spark.driver.maxResultSize=12g --conf
spark.executor.heartbeatInterval=120s --conf spark.network.timeout=600s --conf
spark.eventLog.overwrite=true --conf spark.eventLog.enabled=true --conf
spark.yarn.max.executor.failures=10 --conf spark.sql.catalogImplementation=hive
--conf spark.sql.shuffle.partitions=1000 --conf
spark.driver.extraClassPath=hive-common-2.3.1.jar:hive-exec-2.3.1-core.jar:hive-jdbc-2.3.1.jar:hive-llap-common-2.3.1.jar:hive-metastore-2.3.1.jar:hive-serde-2.3.1.jar:hive-service-2.3.1.jar:hive-service-rpc-2.3.1.jar:hive-shims-0.23-2.3.1.jar:hive-shims-common-2.3.1.jar:hive-storage-api-2.3.1.jar:hive-shims-2.3.1.jar:spark-hive-thriftserver_2.12-3.0.0-preview2.jar:json-20090211.jar
--conf
spark.executor.extraClassPath=hive-common-2.3.1.jar:hive-exec-2.3.1-core.jar:hive-jdbc-2.3.1.jar:hive-llap-common-2.3.1.jar:hive-metastore-2.3.1.jar:hive-serde-2.3.1.jar:hive-service-2.3.1.jar:hive-service-rpc-2.3.1.jar:hive-shims-0.23-2.3.1.jar:hive-shims-common-2.3.1.jar:hive-storage-api-2.3.1.jar:hive-shims-2.3.1.jar:spark-hive-thriftserver_2.12-3.0.0-preview2.jar:json-20090211.jar
--class org.apache.hudi.integ.testsuite.HoodieTestSuiteJob
/opt/hudi-integ-test-bundle-0.6.0-rc1.jar --source-ordering-field timestamp
--target-base-path /user/hive/warehouse/hudi-integ-test-suite/output
--input-base-path /user/hive/warehouse/hudi-integ-test-suite/input
--target-table table1 --props
/var/hoodie/ws/docker/demo/config/test-suite/test-source.properties
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider
--source-limit 300000000 --source-class
org.apache.hudi.utilities.sources.AvroDFSSource --input-file-size 125829120
--workload-yaml-path
/var/hoodie/ws/docker/demo/config/test-suite/complex-dag-cow.yaml
--workload-generator-classname
org.apache.hudi.integ.testsuite.dag.WorkflowDagGenerator --table-type
COPY_ON_WRITE --compact-scheduling-minshare 1 --hoodie-conf
"hoodie.deltastreamer.source.test.num_partitions=100" --hoodie-conf
"hoodie.deltastreamer.source.test.datagen.use_rocksdb_for_storing_existing_keys=false"
--hoodie-conf "hoodie.deltastreamer.source.test.max_unique_records=100000000"
--hoodie-conf "hoodie.embed.timeline.server=false" --hoodie-conf
"hoodie.datasource.write.recordkey.field=_row_key" --hoodie-conf
"hoodie.deltastreamer.source.dfs.root=/user/hive/warehouse/hudi-integ-test-suite/input"
--hoodie-conf
"hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator"
--hoodie-conf "hoodie.datasource.write.partitionpath.field=timestamp"
--hoodie-conf
"hoodie.deltastreamer.schemaprovider.source.schema.file=/var/hoodie/ws/docker/demo/config/test-suite/source.avsc"
--hoodie-conf "hoodie.datasource.hive_sync.assume_date_partitioning=false"
--hoodie-conf
"hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://hiveserver:10000/"
--hoodie-conf "hoodie.datasource.hive_sync.database=testdb" --hoodie-conf
"hoodie.datasource.hive_sync.table=table1" --hoodie-conf
"hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor"
--hoodie-conf "hoodie.datasource.hive_sync.assume_date_partitioning=true"
--hoodie-conf
"hoodie.datasource.write.keytranslator.class=org.apache.hudi.DayBasedPartitionPathKeyTranslator"
--hoodie-conf
"hoodie.deltastreamer.schemaprovider.target.schema.file=/var/hoodie/ws/docker/demo/config/test-suite/source.avsc"
> NoClassDefFoundError with AbstractSyncTool while running HoodieTestSuiteJob
> ---------------------------------------------------------------------------
>
> Key: HUDI-1204
> URL: https://issues.apache.org/jira/browse/HUDI-1204
> Project: Apache Hudi
> Issue Type: Bug
> Components: Testing
> Affects Versions: 0.6.1
> Reporter: sivabalan narayanan
> Assignee: Nishith Agarwal
> Priority: Major
>
> I was trying to run HoodieTestSuiteJob in my local docker set up and ran into
> dep issue.
>
> spark-submit --master local --class
> org.apache.hudi.integ.testsuite.HoodieTestSuiteJob --packages
> com.databricks:spark-avro_2.11:4.0.0
> /opt/hudi-integ-test-bundle-0.6.0-rc1.jar --source-ordering-field timestamp
> --target-base-path /user/hive/warehouse/hudi-test-suite/output
> --input-base-path /user/hive/warehouse/hudi-test-suite/input
> --target-table test_table --props [file:///opt/test-source.properties]
> --schemaprovider-class
> org.apache.hudi.utilities.schema.FilebasedSchemaProvider --source-class
> org.apache.hudi.utilities.sources.AvroDFSSource --input-file-size 12582912
> --workload-yaml-path
> /var/hoodie/ws/docker/demo/config/test-suite/complex-dag-cow.yaml
> --table-type COPY_ON_WRITE --workload-generator-classname yaml
>
> {code:java}
> 20/08/19 21:42:26 WARN NativeCodeLoader: Unable to load native-hadoop library
> for your platform... using builtin-java classes where applicable
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/hudi/sync/common/AbstractSyncTool
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
> at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
> at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$Config.<init>(HoodieDeltaStreamer.java:279)
> at
> org.apache.hudi.integ.testsuite.HoodieTestSuiteJob$HoodieTestSuiteConfig.<init>(HoodieTestSuiteJob.java:153)
> at
> org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:114)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hudi.sync.common.AbstractSyncTool
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 26 more
> {code}
> I tried adding hudi-sync-common as dep to hudi-utilities, but didn't fix the
> issue.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)