[ https://issues.apache.org/jira/browse/HUDI-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181395#comment-17181395 ]
Nishith Agarwal commented on HUDI-1204: --------------------------------------- [~vinoth] The setup works for me now on my local docker. I'm going to send a PR in a couple of hours so you guys can also try it [^complex-dag-cow-2.yaml]Reduced size of DAG to validate, new smaller and corrected dag is attached. Sample output from docker: {code:java} root@adhoc-2:/opt#root@adhoc-2:/opt#root@adhoc-2:/opt# spark-submit --jars /opt/hudi-hive-sync-bundle-0.6.1-SNAPSHOT.jar --packages org.apache.spark:spark-avro_2.11:2.4.0 --conf spark.task.cpus=1 --conf spark.executor.cores=1 --conf spark.task.maxFailures=100 --conf spark.memory.fraction=0.4 --conf spark.rdd.compress=true --conf spark.kryoserializer.buffer.max=2000m --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.memory.storageFraction=0.1 --conf spark.shuffle.service.enabled=true --conf spark.sql.hive.convertMetastoreParquet=false --conf spark.ui.port=5555 --conf spark.driver.maxResultSize=12g --conf spark.executor.heartbeatInterval=120s --conf spark.network.timeout=600s --conf spark.eventLog.overwrite=true --conf spark.eventLog.enabled=true --conf spark.yarn.max.executor.failures=10 --conf spark.sql.catalogImplementation=hive --conf spark.sql.shuffle.partitions=1000 --conf spark.driver.extraClassPath=hive-common-2.3.1.jar:hive-exec-2.3.1-core.jar:hive-jdbc-2.3.1.jar:hive-llap-common-2.3.1.jar:hive-metastore-2.3.1.jar:hive-serde-2.3.1.jar:hive-service-2.3.1.jar:hive-service-rpc-2.3.1.jar:hive-shims-0.23-2.3.1.jar:hive-shims-common-2.3.1.jar:hive-storage-api-2.3.1.jar:hive-shims-2.3.1.jar:spark-hive-thriftserver_2.12-3.0.0-preview2.jar:json-20090211.jar --conf spark.executor.extraClassPath=hive-common-2.3.1.jar:hive-exec-2.3.1-core.jar:hive-jdbc-2.3.1.jar:hive-llap-common-2.3.1.jar:hive-metastore-2.3.1.jar:hive-serde-2.3.1.jar:hive-service-2.3.1.jar:hive-service-rpc-2.3.1.jar:hive-shims-0.23-2.3.1.jar:hive-shims-common-2.3.1.jar:hive-storage-api-2.3.1.jar:hive-shims-2.3.1.jar:spark-hive-thriftserver_2.12-3.0.0-preview2.jar:json-20090211.jar --class org.apache.hudi.integ.testsuite.HoodieTestSuiteJob /opt/hudi-integ-test-bundle-0.6.1-SNAPSHOT.jar --source-ordering-field timestamp --target-base-path /user/hive/warehouse/hudi-integ-test-suite/output --input-base-path /user/hive/warehouse/hudi-integ-test-suite/input --target-table table1 --props /var/hoodie/ws/docker/demo/config/test-suite/test-source.properties --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider --source-limit 300000000 --source-class org.apache.hudi.utilities.sources.AvroDFSSource --input-file-size 125829120 --workload-yaml-path /var/hoodie/ws/docker/demo/config/test-suite/complex-dag-cow-2.yaml --workload-generator-classname org.apache.hudi.integ.testsuite.dag.WorkflowDagGenerator --table-type COPY_ON_WRITE --compact-scheduling-minshare 1 --hoodie-conf "hoodie.deltastreamer.source.test.num_partitions=100" --hoodie-conf "hoodie.deltastreamer.source.test.datagen.use_rocksdb_for_storing_existing_keys=false" --hoodie-conf "hoodie.deltastreamer.source.test.max_unique_records=100000000" --hoodie-conf "hoodie.embed.timeline.server=false" --hoodie-conf "hoodie.datasource.write.recordkey.field=_row_key" --hoodie-conf "hoodie.deltastreamer.source.dfs.root=/user/hive/warehouse/hudi-integ-test-suite/input" --hoodie-conf "hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator" --hoodie-conf "hoodie.datasource.write.partitionpath.field=timestamp" --hoodie-conf "hoodie.deltastreamer.schemaprovider.source.schema.file=/var/hoodie/ws/docker/demo/config/test-suite/source.avsc" --hoodie-conf "hoodie.datasource.hive_sync.assume_date_partitioning=true" --hoodie-conf "hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://hiveserver:10000/" --hoodie-conf "hoodie.datasource.hive_sync.database=testdb" --hoodie-conf "hoodie.datasource.hive_sync.table=table1" --hoodie-conf "hoodie.datasource.hive_sync.partition_fields=_hoodie_partition_path" --hoodie-conf "hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor" --hoodie-conf "hoodie.deltastreamer.keygen.timebased.timestamp.type=UNIX_TIMESTAMP" --hoodie-conf "hoodie.datasource.hive_sync.assume_date_partitioning=false" --hoodie-conf "hoodie.datasource.write.keytranslator.class=org.apache.hudi.DayBasedPartitionPathKeyTranslator" --hoodie-conf "hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd" --hoodie-conf "hoodie.deltastreamer.schemaprovider.target.schema.file=/var/hoodie/ws/docker/demo/config/test-suite/source.avsc"Ivy Default Cache set to: /root/.ivy2/cacheThe jars for the packages stored in: /root/.ivy2/jars:: loading settings :: url = jar:file:/opt/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xmlorg.apache.spark#spark-avro_2.11 added as a dependency:: resolving dependencies :: org.apache.spark#spark-submit-parent-4019d059-06cc-4c6f-b848-b35caac9ad6c;1.0 confs: [default] found org.apache.spark#spark-avro_2.11;2.4.0 in central found org.spark-project.spark#unused;1.0.0 in central:: resolution report :: resolve 695ms :: artifacts dl 19ms :: modules in use: org.apache.spark#spark-avro_2.11;2.4.0 from central in [default] org.spark-project.spark#unused;1.0.0 from central in [default] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 2 | 0 | 0 | 0 || 2 | 0 | ---------------------------------------------------------------------:: retrieving :: org.apache.spark#spark-submit-parent-4019d059-06cc-4c6f-b848-b35caac9ad6c confs: [default] 0 artifacts copied, 2 already retrieved (0kB/24ms)20/08/20 19:29:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable20/08/20 19:29:16 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect.20/08/20 19:29:19 WARN SparkSession$Builder: Using an existing SparkSession; some configuration may not take effect.20/08/20 19:29:23 WARN GenericRecordFullPayloadGenerator: The schema does not have any collections/complex fields. Cannot achieve minPayloadSize : 70000Fetching sourceFetched sourceNishith souce limit 300000000Root path => /user/hive/warehouse/hudi-integ-test-suite/input source limit => 300000000Last checkpoint Option{val=null}Nishith paths selected (Option{val=hdfs://namenode:8020/user/hive/warehouse/hudi-integ-test-suite/input/1/958101f0-b5e9-4585-80ab-9350abc1a2c4.avro},1597951764580)20/08/20 19:29:25 WARN AvroKeyInputFormat: Reader schema was not set. Use AvroJob.setInputKeySchema() if desired.20/08/20 19:29:25 WARN AvroKeyInputFormat: Reader schema was not set. Use AvroJob.setInputKeySchema() if desired.Executing hive syncSycning hiveSyncing target hoodie table with hive table(table1). Hive metastore URL :jdbc:hive2://hiveserver:10000/, basePath :/user/hive/warehouse/hudi-integ-test-suite/outputOKExecuting hive query node {}74cc29a3-7218-4886-91e9-d160c71bd94bSyncing to hive node74cc29a3-7218-4886-91e9-d160c71bd94bSycning hiveSyncing target hoodie table with hive table(table1). Hive metastore URL :jdbc:hive2://hiveserver:10000/, basePath :/user/hive/warehouse/hudi-integ-test-suite/outputOK20/08/20 19:30:34 WARN GenericRecordFullPayloadGenerator: The schema does not have any collections/complex fields. Cannot achieve minPayloadSize : 7000020/08/20 19:30:34 WARN GenericRecordFullPayloadGenerator: The schema does not have any collections/complex fields. Cannot achieve minPayloadSize : 7000020/08/20 19:30:34 WARN GenericRecordFullPayloadGenerator: The schema does not have any collections/complex fields. Cannot achieve minPayloadSize : 7000020/08/20 19:30:35 WARN GenericRecordFullPayloadGenerator: The schema does not have any collections/complex fields. Cannot achieve minPayloadSize : 70000Fetching sourceFetched sourceNishith souce limit 300000000Root path => /user/hive/warehouse/hudi-integ-test-suite/input source limit => 300000000Last checkpoint Option{val=null}Nishith paths selected (Option{val=hdfs://namenode:8020/user/hive/warehouse/hudi-integ-test-suite/input/1/958101f0-b5e9-4585-80ab-9350abc1a2c4.avro,hdfs://namenode:8020/user/hive/warehouse/hudi-integ-test-suite/input/2/77e1743d-76d9-4c2c-b2d4-d249da37635b.avro,hdfs://namenode:8020/user/hive/warehouse/hudi-integ-test-suite/input/2/e8e66c21-e660-4a57-af09-d760a738dffa.avro,hdfs://namenode:8020/user/hive/warehouse/hudi-integ-test-suite/input/2/6cb9ea98-89a5-4911-8828-2a6d3396d388.avro,hdfs://namenode:8020/user/hive/warehouse/hudi-integ-test-suite/input/2/726c86dd-4893-4480-9bfc-62a9bb663611.avro},1597951835169)20/08/20 19:30:35 WARN AvroKeyInputFormat: Reader schema was not set. Use AvroJob.setInputKeySchema() if desired.20/08/20 19:30:35 WARN AvroKeyInputFormat: Reader schema was not set. Use AvroJob.setInputKeySchema() if desired.20/08/20 19:30:35 WARN AvroKeyInputFormat: Reader schema was not set. Use AvroJob.setInputKeySchema() if desired.20/08/20 19:30:35 WARN AvroKeyInputFormat: Reader schema was not set. Use AvroJob.setInputKeySchema() if desired.20/08/20 19:30:35 WARN AvroKeyInputFormat: Reader schema was not set. Use AvroJob.setInputKeySchema() if desired.20/08/20 19:30:35 WARN AvroKeyInputFormat: Reader schema was not set. Use AvroJob.setInputKeySchema() if desired.Executing hive query node {}cab3d02a-012f-426c-aa7a-096deb643b71Syncing to hive nodecab3d02a-012f-426c-aa7a-096deb643b71Sycning hiveSyncing target hoodie table with hive table(table1). Hive metastore URL :jdbc:hive2://hiveserver:10000/, basePath :/user/hive/warehouse/hudi-integ-test-suite/outputOKroot@adhoc-2:/opt# {code} > NoClassDefFoundError with AbstractSyncTool while running HoodieTestSuiteJob > --------------------------------------------------------------------------- > > Key: HUDI-1204 > URL: https://issues.apache.org/jira/browse/HUDI-1204 > Project: Apache Hudi > Issue Type: Bug > Components: Testing > Affects Versions: 0.6.1 > Reporter: sivabalan narayanan > Assignee: Nishith Agarwal > Priority: Major > Attachments: complex-dag-cow-2.yaml > > > I was trying to run HoodieTestSuiteJob in my local docker set up and ran into > dep issue. > > spark-submit --master local --class > org.apache.hudi.integ.testsuite.HoodieTestSuiteJob --packages > com.databricks:spark-avro_2.11:4.0.0 > /opt/hudi-integ-test-bundle-0.6.0-rc1.jar --source-ordering-field timestamp > --target-base-path /user/hive/warehouse/hudi-test-suite/output > --input-base-path /user/hive/warehouse/hudi-test-suite/input > --target-table test_table --props [file:///opt/test-source.properties] > --schemaprovider-class > org.apache.hudi.utilities.schema.FilebasedSchemaProvider --source-class > org.apache.hudi.utilities.sources.AvroDFSSource --input-file-size 12582912 > --workload-yaml-path > /var/hoodie/ws/docker/demo/config/test-suite/complex-dag-cow.yaml > --table-type COPY_ON_WRITE --workload-generator-classname yaml > > {code:java} > 20/08/19 21:42:26 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hudi/sync/common/AbstractSyncTool > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:763) > at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > at java.net.URLClassLoader.access$100(URLClassLoader.java:74) > at java.net.URLClassLoader$1.run(URLClassLoader.java:369) > at java.net.URLClassLoader$1.run(URLClassLoader.java:363) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:362) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$Config.<init>(HoodieDeltaStreamer.java:279) > at > org.apache.hudi.integ.testsuite.HoodieTestSuiteJob$HoodieTestSuiteConfig.<init>(HoodieTestSuiteJob.java:153) > at > org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:114) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.hudi.sync.common.AbstractSyncTool > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 26 more > {code} > I tried adding hudi-sync-common as dep to hudi-utilities, but didn't fix the > issue. > -- This message was sent by Atlassian Jira (v8.3.4#803005)