[ 
https://issues.apache.org/jira/browse/HUDI-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181395#comment-17181395
 ] 

Nishith Agarwal commented on HUDI-1204:
---------------------------------------

[~vinoth] The setup works for me now on my local docker. I'm going to send a PR 
in a couple of hours so you guys can also try it

[^complex-dag-cow-2.yaml]Reduced size of DAG to validate, new smaller and 
corrected dag is attached.

Sample output from docker:
{code:java}
root@adhoc-2:/opt#root@adhoc-2:/opt#root@adhoc-2:/opt# spark-submit --jars 
/opt/hudi-hive-sync-bundle-0.6.1-SNAPSHOT.jar --packages 
org.apache.spark:spark-avro_2.11:2.4.0 --conf spark.task.cpus=1 --conf 
spark.executor.cores=1 --conf spark.task.maxFailures=100 --conf 
spark.memory.fraction=0.4  --conf spark.rdd.compress=true  --conf 
spark.kryoserializer.buffer.max=2000m --conf 
spark.serializer=org.apache.spark.serializer.KryoSerializer --conf 
spark.memory.storageFraction=0.1 --conf spark.shuffle.service.enabled=true  
--conf spark.sql.hive.convertMetastoreParquet=false  --conf spark.ui.port=5555 
--conf spark.driver.maxResultSize=12g --conf 
spark.executor.heartbeatInterval=120s --conf spark.network.timeout=600s --conf 
spark.eventLog.overwrite=true --conf spark.eventLog.enabled=true --conf 
spark.yarn.max.executor.failures=10 --conf spark.sql.catalogImplementation=hive 
--conf spark.sql.shuffle.partitions=1000 --conf 
spark.driver.extraClassPath=hive-common-2.3.1.jar:hive-exec-2.3.1-core.jar:hive-jdbc-2.3.1.jar:hive-llap-common-2.3.1.jar:hive-metastore-2.3.1.jar:hive-serde-2.3.1.jar:hive-service-2.3.1.jar:hive-service-rpc-2.3.1.jar:hive-shims-0.23-2.3.1.jar:hive-shims-common-2.3.1.jar:hive-storage-api-2.3.1.jar:hive-shims-2.3.1.jar:spark-hive-thriftserver_2.12-3.0.0-preview2.jar:json-20090211.jar
 --conf 
spark.executor.extraClassPath=hive-common-2.3.1.jar:hive-exec-2.3.1-core.jar:hive-jdbc-2.3.1.jar:hive-llap-common-2.3.1.jar:hive-metastore-2.3.1.jar:hive-serde-2.3.1.jar:hive-service-2.3.1.jar:hive-service-rpc-2.3.1.jar:hive-shims-0.23-2.3.1.jar:hive-shims-common-2.3.1.jar:hive-storage-api-2.3.1.jar:hive-shims-2.3.1.jar:spark-hive-thriftserver_2.12-3.0.0-preview2.jar:json-20090211.jar
 --class org.apache.hudi.integ.testsuite.HoodieTestSuiteJob 
/opt/hudi-integ-test-bundle-0.6.1-SNAPSHOT.jar --source-ordering-field 
timestamp --target-base-path /user/hive/warehouse/hudi-integ-test-suite/output 
--input-base-path /user/hive/warehouse/hudi-integ-test-suite/input 
--target-table table1 --props 
/var/hoodie/ws/docker/demo/config/test-suite/test-source.properties 
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider 
--source-limit 300000000 --source-class 
org.apache.hudi.utilities.sources.AvroDFSSource --input-file-size 125829120 
--workload-yaml-path 
/var/hoodie/ws/docker/demo/config/test-suite/complex-dag-cow-2.yaml 
--workload-generator-classname 
org.apache.hudi.integ.testsuite.dag.WorkflowDagGenerator --table-type 
COPY_ON_WRITE --compact-scheduling-minshare 1 --hoodie-conf 
"hoodie.deltastreamer.source.test.num_partitions=100" --hoodie-conf 
"hoodie.deltastreamer.source.test.datagen.use_rocksdb_for_storing_existing_keys=false"
  --hoodie-conf "hoodie.deltastreamer.source.test.max_unique_records=100000000" 
--hoodie-conf "hoodie.embed.timeline.server=false" --hoodie-conf 
"hoodie.datasource.write.recordkey.field=_row_key" --hoodie-conf 
"hoodie.deltastreamer.source.dfs.root=/user/hive/warehouse/hudi-integ-test-suite/input"
 --hoodie-conf 
"hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator"
 --hoodie-conf "hoodie.datasource.write.partitionpath.field=timestamp" 
--hoodie-conf 
"hoodie.deltastreamer.schemaprovider.source.schema.file=/var/hoodie/ws/docker/demo/config/test-suite/source.avsc"
 --hoodie-conf "hoodie.datasource.hive_sync.assume_date_partitioning=true" 
--hoodie-conf 
"hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://hiveserver:10000/" 
--hoodie-conf "hoodie.datasource.hive_sync.database=testdb" --hoodie-conf 
"hoodie.datasource.hive_sync.table=table1" --hoodie-conf 
"hoodie.datasource.hive_sync.partition_fields=_hoodie_partition_path" 
--hoodie-conf 
"hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor"
  --hoodie-conf 
"hoodie.deltastreamer.keygen.timebased.timestamp.type=UNIX_TIMESTAMP" 
--hoodie-conf "hoodie.datasource.hive_sync.assume_date_partitioning=false" 
--hoodie-conf 
"hoodie.datasource.write.keytranslator.class=org.apache.hudi.DayBasedPartitionPathKeyTranslator"
  --hoodie-conf 
"hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd" 
--hoodie-conf 
"hoodie.deltastreamer.schemaprovider.target.schema.file=/var/hoodie/ws/docker/demo/config/test-suite/source.avsc"Ivy
 Default Cache set to: /root/.ivy2/cacheThe jars for the packages stored in: 
/root/.ivy2/jars:: loading settings :: url = 
jar:file:/opt/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xmlorg.apache.spark#spark-avro_2.11
 added as a dependency:: resolving dependencies :: 
org.apache.spark#spark-submit-parent-4019d059-06cc-4c6f-b848-b35caac9ad6c;1.0 
confs: [default] found org.apache.spark#spark-avro_2.11;2.4.0 in central found 
org.spark-project.spark#unused;1.0.0 in central:: resolution report :: resolve 
695ms :: artifacts dl 19ms :: modules in use: 
org.apache.spark#spark-avro_2.11;2.4.0 from central in [default] 
org.spark-project.spark#unused;1.0.0 from central in [default] 
--------------------------------------------------------------------- |         
         |            modules            ||   artifacts   | |       conf       
| number| search|dwnlded|evicted|| number|dwnlded| 
--------------------------------------------------------------------- |      
default     |   2   |   0   |   0   |   0   ||   2   |   0   | 
---------------------------------------------------------------------:: 
retrieving :: 
org.apache.spark#spark-submit-parent-4019d059-06cc-4c6f-b848-b35caac9ad6c 
confs: [default] 0 artifacts copied, 2 already retrieved (0kB/24ms)20/08/20 
19:29:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your 
platform... using builtin-java classes where applicable20/08/20 19:29:16 WARN 
SparkContext: Using an existing SparkContext; some configuration may not take 
effect.20/08/20 19:29:19 WARN SparkSession$Builder: Using an existing 
SparkSession; some configuration may not take effect.20/08/20 19:29:23 WARN 
GenericRecordFullPayloadGenerator: The schema does not have any 
collections/complex fields. Cannot achieve minPayloadSize : 70000Fetching 
sourceFetched sourceNishith souce limit 300000000Root path => 
/user/hive/warehouse/hudi-integ-test-suite/input source limit => 300000000Last 
checkpoint Option{val=null}Nishith paths selected 
(Option{val=hdfs://namenode:8020/user/hive/warehouse/hudi-integ-test-suite/input/1/958101f0-b5e9-4585-80ab-9350abc1a2c4.avro},1597951764580)20/08/20
 19:29:25 WARN AvroKeyInputFormat: Reader schema was not set. Use 
AvroJob.setInputKeySchema() if desired.20/08/20 19:29:25 WARN 
AvroKeyInputFormat: Reader schema was not set. Use AvroJob.setInputKeySchema() 
if desired.Executing hive syncSycning hiveSyncing target hoodie table with hive 
table(table1). Hive metastore URL :jdbc:hive2://hiveserver:10000/, basePath 
:/user/hive/warehouse/hudi-integ-test-suite/outputOKExecuting hive query node 
{}74cc29a3-7218-4886-91e9-d160c71bd94bSyncing to hive 
node74cc29a3-7218-4886-91e9-d160c71bd94bSycning hiveSyncing target hoodie table 
with hive table(table1). Hive metastore URL :jdbc:hive2://hiveserver:10000/, 
basePath :/user/hive/warehouse/hudi-integ-test-suite/outputOK20/08/20 19:30:34 
WARN GenericRecordFullPayloadGenerator: The schema does not have any 
collections/complex fields. Cannot achieve minPayloadSize : 7000020/08/20 
19:30:34 WARN GenericRecordFullPayloadGenerator: The schema does not have any 
collections/complex fields. Cannot achieve minPayloadSize : 7000020/08/20 
19:30:34 WARN GenericRecordFullPayloadGenerator: The schema does not have any 
collections/complex fields. Cannot achieve minPayloadSize : 7000020/08/20 
19:30:35 WARN GenericRecordFullPayloadGenerator: The schema does not have any 
collections/complex fields. Cannot achieve minPayloadSize : 70000Fetching 
sourceFetched sourceNishith souce limit 300000000Root path => 
/user/hive/warehouse/hudi-integ-test-suite/input source limit => 300000000Last 
checkpoint Option{val=null}Nishith paths selected 
(Option{val=hdfs://namenode:8020/user/hive/warehouse/hudi-integ-test-suite/input/1/958101f0-b5e9-4585-80ab-9350abc1a2c4.avro,hdfs://namenode:8020/user/hive/warehouse/hudi-integ-test-suite/input/2/77e1743d-76d9-4c2c-b2d4-d249da37635b.avro,hdfs://namenode:8020/user/hive/warehouse/hudi-integ-test-suite/input/2/e8e66c21-e660-4a57-af09-d760a738dffa.avro,hdfs://namenode:8020/user/hive/warehouse/hudi-integ-test-suite/input/2/6cb9ea98-89a5-4911-8828-2a6d3396d388.avro,hdfs://namenode:8020/user/hive/warehouse/hudi-integ-test-suite/input/2/726c86dd-4893-4480-9bfc-62a9bb663611.avro},1597951835169)20/08/20
 19:30:35 WARN AvroKeyInputFormat: Reader schema was not set. Use 
AvroJob.setInputKeySchema() if desired.20/08/20 19:30:35 WARN 
AvroKeyInputFormat: Reader schema was not set. Use AvroJob.setInputKeySchema() 
if desired.20/08/20 19:30:35 WARN AvroKeyInputFormat: Reader schema was not 
set. Use AvroJob.setInputKeySchema() if desired.20/08/20 19:30:35 WARN 
AvroKeyInputFormat: Reader schema was not set. Use AvroJob.setInputKeySchema() 
if desired.20/08/20 19:30:35 WARN AvroKeyInputFormat: Reader schema was not 
set. Use AvroJob.setInputKeySchema() if desired.20/08/20 19:30:35 WARN 
AvroKeyInputFormat: Reader schema was not set. Use AvroJob.setInputKeySchema() 
if desired.Executing hive query node 
{}cab3d02a-012f-426c-aa7a-096deb643b71Syncing to hive 
nodecab3d02a-012f-426c-aa7a-096deb643b71Sycning hiveSyncing target hoodie table 
with hive table(table1). Hive metastore URL :jdbc:hive2://hiveserver:10000/, 
basePath :/user/hive/warehouse/hudi-integ-test-suite/outputOKroot@adhoc-2:/opt#
{code}

> NoClassDefFoundError with AbstractSyncTool while running HoodieTestSuiteJob
> ---------------------------------------------------------------------------
>
>                 Key: HUDI-1204
>                 URL: https://issues.apache.org/jira/browse/HUDI-1204
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Testing
>    Affects Versions: 0.6.1
>            Reporter: sivabalan narayanan
>            Assignee: Nishith Agarwal
>            Priority: Major
>         Attachments: complex-dag-cow-2.yaml
>
>
> I was trying to run HoodieTestSuiteJob in my local docker set up and ran into 
> dep issue.
>  
> spark-submit --master local --class 
> org.apache.hudi.integ.testsuite.HoodieTestSuiteJob --packages 
> com.databricks:spark-avro_2.11:4.0.0 
> /opt/hudi-integ-test-bundle-0.6.0-rc1.jar  --source-ordering-field timestamp  
>   --target-base-path /user/hive/warehouse/hudi-test-suite/output    
> --input-base-path /user/hive/warehouse/hudi-test-suite/input    
> --target-table test_table    --props [file:///opt/test-source.properties]    
> --schemaprovider-class 
> org.apache.hudi.utilities.schema.FilebasedSchemaProvider    --source-class 
> org.apache.hudi.utilities.sources.AvroDFSSource    --input-file-size 12582912 
>  --workload-yaml-path 
> /var/hoodie/ws/docker/demo/config/test-suite/complex-dag-cow.yaml 
> --table-type COPY_ON_WRITE    --workload-generator-classname yaml
>  
> {code:java}
> 20/08/19 21:42:26 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/hudi/sync/common/AbstractSyncTool
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
> at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
> at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$Config.<init>(HoodieDeltaStreamer.java:279)
> at 
> org.apache.hudi.integ.testsuite.HoodieTestSuiteJob$HoodieTestSuiteConfig.<init>(HoodieTestSuiteJob.java:153)
> at 
> org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:114)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hudi.sync.common.AbstractSyncTool
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 26 more
>  {code}
> I tried adding hudi-sync-common as dep to hudi-utilities, but didn't fix the 
> issue. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to