soumilshah1995 opened a new issue, #8309: URL: https://github.com/apache/hudi/issues/8309
Hello All firstly thank you very much for all help from community. i would want to mention i am new to delta streamer i have worked a lot with Glue jobs and i want to experiment with delta streamer so i can make videos and teach the community i have setup complete pipeline from AWS Aurora Postgres > DMS > S3 and i have EMR cluster 6.9 with Spark 3 Attaching links for sample parquet files and sample json how it looks like  Link to data files https://drive.google.com/drive/folders/1BwNEK649hErbsWcYLZhqCWnaXFX3mIsg?usp=share_link Here is how i submit jobs ``` spark-submit --master yarn --deploy-mode cluster --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer /usr/lib/hudi/hudi-utilities-bundle.jar --table-type COPY_ON_WRITE --source-ordering-field replicadmstimestamp --source-class org.apache.hudi.utilities.sources.ParquetDFSSource --target-base-path s3://sql-server-dms-demo/hudi/public/sales --target-table invoice --payload-class org.apache.hudi.common.model.AWSDmsAvroPayload --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator --hoodie-conf hoodie.datasource.write.recordkey.field=invoiceid --hoodie-conf hoodie.deltastreamer.source.dfs.root=s3://sql-server-dms-demo/raw/public/sales/ ``` # Error i get ``` 23/03/28 13:08:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 23/03/28 13:08:49 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at ip-172-32-147-4.ec2.internal/172.32.147.4:8032 23/03/28 13:08:50 INFO Configuration: resource-types.xml not found 23/03/28 13:08:50 INFO ResourceUtils: Unable to find 'resource-types.xml'. 23/03/28 13:08:50 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (6144 MB per container) 23/03/28 13:08:50 INFO Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead 23/03/28 13:08:50 INFO Client: Setting up container launch context for our AM 23/03/28 13:08:50 INFO Client: Setting up the launch environment for our AM container 23/03/28 13:08:50 INFO Client: Preparing resources for our AM container 23/03/28 13:08:50 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 23/03/28 13:08:52 INFO Client: Uploading resource file:/mnt/tmp/spark-7f0fabb5-07de-43c3-8a26-a2325d5be63a/__spark_libs__363124573059127100.zip -> hdfs://ip-172-32-147-4.ec2.internal:8020/user/hadoop/.sparkStaging/application_1680007316515_0003/__spark_libs__363124573059127100.zip 23/03/28 13:08:53 INFO Client: Uploading resource file:/usr/lib/hudi/hudi-utilities-bundle_2.12-0.12.1-amzn-0.jar -> hdfs://ip-172-32-147-4.ec2.internal:8020/user/hadoop/.sparkStaging/application_1680007316515_0003/hudi-utilities-bundle_2.12-0.12.1-amzn-0.jar 23/03/28 13:08:53 INFO Client: Uploading resource file:/etc/spark/conf.dist/hive-site.xml -> hdfs://ip-172-32-147-4.ec2.internal:8020/user/hadoop/.sparkStaging/application_1680007316515_0003/hive-site.xml 23/03/28 13:08:54 INFO Client: Uploading resource file:/etc/hudi/conf.dist/hudi-defaults.conf -> hdfs://ip-172-32-147-4.ec2.internal:8020/user/hadoop/.sparkStaging/application_1680007316515_0003/hudi-defaults.conf 23/03/28 13:08:54 INFO Client: Uploading resource file:/mnt/tmp/spark-7f0fabb5-07de-43c3-8a26-a2325d5be63a/__spark_conf__2001263387666561545.zip -> hdfs://ip-172-32-147-4.ec2.internal:8020/user/hadoop/.sparkStaging/application_1680007316515_0003/__spark_conf__.zip 23/03/28 13:08:54 INFO SecurityManager: Changing view acls to: hadoop 23/03/28 13:08:54 INFO SecurityManager: Changing modify acls to: hadoop 23/03/28 13:08:54 INFO SecurityManager: Changing view acls groups to: 23/03/28 13:08:54 INFO SecurityManager: Changing modify acls groups to: 23/03/28 13:08:54 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set() 23/03/28 13:08:54 INFO Client: Submitting application application_1680007316515_0003 to ResourceManager 23/03/28 13:08:54 INFO YarnClientImpl: Submitted application application_1680007316515_0003 23/03/28 13:08:55 INFO Client: Application report for application_1680007316515_0003 (state: ACCEPTED) 23/03/28 13:08:55 INFO Client: client token: N/A diagnostics: AM container is launched, waiting for AM container to Register with RM ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1680008934287 final status: UNDEFINED tracking URL: http://ip-172-32-147-4.ec2.internal:20888/proxy/application_1680007316515_0003/ user: hadoop 23/03/28 13:08:56 INFO Client: Application report for application_1680007316515_0003 (state: ACCEPTED) 23/03/28 13:08:57 INFO Client: Application report for application_1680007316515_0003 (state: ACCEPTED) 23/03/28 13:08:58 INFO Client: Application report for application_1680007316515_0003 (state: ACCEPTED) 23/03/28 13:08:59 INFO Client: Application report for application_1680007316515_0003 (state: ACCEPTED) 23/03/28 13:09:00 INFO Client: Application report for application_1680007316515_0003 (state: ACCEPTED) 23/03/28 13:09:01 INFO Client: Application report for application_1680007316515_0003 (state: RUNNING) 23/03/28 13:09:01 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: ip-172-32-12-21.ec2.internal ApplicationMaster RPC port: 34367 queue: default start time: 1680008934287 final status: UNDEFINED tracking URL: http://ip-172-32-147-4.ec2.internal:20888/proxy/application_1680007316515_0003/ user: hadoop 23/03/28 13:09:02 INFO Client: Application report for application_1680007316515_0003 (state: RUNNING) 23/03/28 13:09:03 INFO Client: Application report for application_1680007316515_0003 (state: RUNNING) 23/03/28 13:09:04 INFO Client: Application report for application_1680007316515_0003 (state: ACCEPTED) 23/03/28 13:09:04 INFO Client: client token: N/A diagnostics: AM container is launched, waiting for AM container to Register with RM ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1680008934287 final status: UNDEFINED tracking URL: http://ip-172-32-147-4.ec2.internal:20888/proxy/application_1680007316515_0003/ user: hadoop 23/03/28 13:09:05 INFO Client: Application report for application_1680007316515_0003 (state: ACCEPTED) 23/03/28 13:09:06 INFO Client: Application report for application_1680007316515_0003 (state: ACCEPTED) 23/03/28 13:09:07 INFO Client: Application report for application_1680007316515_0003 (state: ACCEPTED) 23/03/28 13:09:08 INFO Client: Application report for application_1680007316515_0003 (state: ACCEPTED) 23/03/28 13:09:09 INFO Client: Application report for application_1680007316515_0003 (state: RUNNING) 23/03/28 13:09:09 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: ip-172-32-12-21.ec2.internal ApplicationMaster RPC port: 42741 queue: default start time: 1680008934287 final status: UNDEFINED tracking URL: http://ip-172-32-147-4.ec2.internal:20888/proxy/application_1680007316515_0003/ user: hadoop 23/03/28 13:09:10 INFO Client: Application report for application_1680007316515_0003 (state: RUNNING) 23/03/28 13:09:11 INFO Client: Application report for application_1680007316515_0003 (state: FINISHED) 23/03/28 13:09:11 INFO Client: client token: N/A diagnostics: User class threw exception: java.io.IOException: Could not load key generator class org.apache.hudi.keygen.SimpleKeyGenerator at org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory.createKeyGenerator(HoodieSparkKeyGeneratorFactory.java:74) at org.apache.hudi.utilities.deltastreamer.DeltaSync.<init>(DeltaSync.java:235) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:675) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:146) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:119) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:571) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:742) Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class org.apache.hudi.keygen.SimpleKeyGenerator at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:91) at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:118) at org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory.createKeyGenerator(HoodieSparkKeyGeneratorFactory.java:72) ... 10 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89) ... 12 more Caused by: java.lang.IllegalArgumentException: Property hoodie.datasource.write.partitionpath.field not found at org.apache.hudi.common.config.TypedProperties.checkKey(TypedProperties.java:67) at org.apache.hudi.common.config.TypedProperties.getString(TypedProperties.java:72) at org.apache.hudi.keygen.SimpleKeyGenerator.<init>(SimpleKeyGenerator.java:41) ... 17 more ApplicationMaster host: ip-172-32-12-21.ec2.internal ApplicationMaster RPC port: 42741 queue: default start time: 1680008934287 final status: FAILED tracking URL: http://ip-172-32-147-4.ec2.internal:20888/proxy/application_1680007316515_0003/ user: hadoop 23/03/28 13:09:11 ERROR Client: Application diagnostics message: User class threw exception: java.io.IOException: Could not load key generator class org.apache.hudi.keygen.SimpleKeyGenerator at org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory.createKeyGenerator(HoodieSparkKeyGeneratorFactory.java:74) at org.apache.hudi.utilities.deltastreamer.DeltaSync.<init>(DeltaSync.java:235) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:675) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:146) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:119) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:571) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:742) Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class org.apache.hudi.keygen.SimpleKeyGenerator at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:91) at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:118) at org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory.createKeyGenerator(HoodieSparkKeyGeneratorFactory.java:72) ... 10 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89) ... 12 more Caused by: java.lang.IllegalArgumentException: Property hoodie.datasource.write.partitionpath.field not found at org.apache.hudi.common.config.TypedProperties.checkKey(TypedProperties.java:67) at org.apache.hudi.common.config.TypedProperties.getString(TypedProperties.java:72) at org.apache.hudi.keygen.SimpleKeyGenerator.<init>(SimpleKeyGenerator.java:41) ... 17 more Exception in thread "main" org.apache.spark.SparkException: Application application_1680007316515_0003 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1354) at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1776) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1006) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1095) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1104) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 23/03/28 13:09:11 INFO ShutdownHookManager: Shutdown hook called 23/03/28 13:09:11 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-dbd4de25-f7b1-4875-b7fd-c599028ae4e0 23/03/28 13:09:11 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-7f0fabb5-07de-43c3-8a26-a2325d5be63a Command exiting with ret '1' ``` * Any advice | Feedback and pointing out what i am doing wrong would be great -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
