soumilshah1995 opened a new issue, #8309:
URL: https://github.com/apache/hudi/issues/8309

   Hello All 
   firstly thank you very much for all help from community. i would want to 
mention i am new to delta streamer i have worked a lot with Glue jobs and i 
want to experiment with delta streamer so i can make videos and teach the 
community 
   
   i have setup complete pipeline from AWS Aurora Postgres  > DMS > S3 and i 
have EMR cluster 6.9 with Spark 3
   
   Attaching links for sample parquet files and sample json how it looks like 
   
![image](https://user-images.githubusercontent.com/39345855/228249922-ac19cf34-9112-40ff-b465-db1e9006eb43.png)
   
   Link to data files 
https://drive.google.com/drive/folders/1BwNEK649hErbsWcYLZhqCWnaXFX3mIsg?usp=share_link
   
   
   Here is how i submit jobs 
   ```
       spark-submit
       --master yarn
       --deploy-mode cluster
       --conf spark.serializer=org.apache.spark.serializer.KryoSerializer
       --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
/usr/lib/hudi/hudi-utilities-bundle.jar
       --table-type COPY_ON_WRITE
       --source-ordering-field replicadmstimestamp
       --source-class org.apache.hudi.utilities.sources.ParquetDFSSource
       --target-base-path s3://sql-server-dms-demo/hudi/public/sales
       --target-table invoice
       --payload-class org.apache.hudi.common.model.AWSDmsAvroPayload
       --hoodie-conf 
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator
       --hoodie-conf hoodie.datasource.write.recordkey.field=invoiceid
       --hoodie-conf 
hoodie.deltastreamer.source.dfs.root=s3://sql-server-dms-demo/raw/public/sales/
   ```
   
   # Error i get 
   ```
   23/03/28 13:08:49 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   23/03/28 13:08:49 INFO DefaultNoHARMFailoverProxyProvider: Connecting to 
ResourceManager at ip-172-32-147-4.ec2.internal/172.32.147.4:8032
   23/03/28 13:08:50 INFO Configuration: resource-types.xml not found
   23/03/28 13:08:50 INFO ResourceUtils: Unable to find 'resource-types.xml'.
   23/03/28 13:08:50 INFO Client: Verifying our application has not requested 
more than the maximum memory capability of the cluster (6144 MB per container)
   23/03/28 13:08:50 INFO Client: Will allocate AM container, with 2432 MB 
memory including 384 MB overhead
   23/03/28 13:08:50 INFO Client: Setting up container launch context for our AM
   23/03/28 13:08:50 INFO Client: Setting up the launch environment for our AM 
container
   23/03/28 13:08:50 INFO Client: Preparing resources for our AM container
   23/03/28 13:08:50 WARN Client: Neither spark.yarn.jars nor 
spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
   23/03/28 13:08:52 INFO Client: Uploading resource 
file:/mnt/tmp/spark-7f0fabb5-07de-43c3-8a26-a2325d5be63a/__spark_libs__363124573059127100.zip
 -> 
hdfs://ip-172-32-147-4.ec2.internal:8020/user/hadoop/.sparkStaging/application_1680007316515_0003/__spark_libs__363124573059127100.zip
   23/03/28 13:08:53 INFO Client: Uploading resource 
file:/usr/lib/hudi/hudi-utilities-bundle_2.12-0.12.1-amzn-0.jar -> 
hdfs://ip-172-32-147-4.ec2.internal:8020/user/hadoop/.sparkStaging/application_1680007316515_0003/hudi-utilities-bundle_2.12-0.12.1-amzn-0.jar
   23/03/28 13:08:53 INFO Client: Uploading resource 
file:/etc/spark/conf.dist/hive-site.xml -> 
hdfs://ip-172-32-147-4.ec2.internal:8020/user/hadoop/.sparkStaging/application_1680007316515_0003/hive-site.xml
   23/03/28 13:08:54 INFO Client: Uploading resource 
file:/etc/hudi/conf.dist/hudi-defaults.conf -> 
hdfs://ip-172-32-147-4.ec2.internal:8020/user/hadoop/.sparkStaging/application_1680007316515_0003/hudi-defaults.conf
   23/03/28 13:08:54 INFO Client: Uploading resource 
file:/mnt/tmp/spark-7f0fabb5-07de-43c3-8a26-a2325d5be63a/__spark_conf__2001263387666561545.zip
 -> 
hdfs://ip-172-32-147-4.ec2.internal:8020/user/hadoop/.sparkStaging/application_1680007316515_0003/__spark_conf__.zip
   23/03/28 13:08:54 INFO SecurityManager: Changing view acls to: hadoop
   23/03/28 13:08:54 INFO SecurityManager: Changing modify acls to: hadoop
   23/03/28 13:08:54 INFO SecurityManager: Changing view acls groups to: 
   23/03/28 13:08:54 INFO SecurityManager: Changing modify acls groups to: 
   23/03/28 13:08:54 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups 
with view permissions: Set(); users  with modify permissions: Set(hadoop); 
groups with modify permissions: Set()
   23/03/28 13:08:54 INFO Client: Submitting application 
application_1680007316515_0003 to ResourceManager
   23/03/28 13:08:54 INFO YarnClientImpl: Submitted application 
application_1680007316515_0003
   23/03/28 13:08:55 INFO Client: Application report for 
application_1680007316515_0003 (state: ACCEPTED)
   23/03/28 13:08:55 INFO Client: 
         client token: N/A
         diagnostics: AM container is launched, waiting for AM container to 
Register with RM
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1680008934287
         final status: UNDEFINED
         tracking URL: 
http://ip-172-32-147-4.ec2.internal:20888/proxy/application_1680007316515_0003/
         user: hadoop
   23/03/28 13:08:56 INFO Client: Application report for 
application_1680007316515_0003 (state: ACCEPTED)
   23/03/28 13:08:57 INFO Client: Application report for 
application_1680007316515_0003 (state: ACCEPTED)
   23/03/28 13:08:58 INFO Client: Application report for 
application_1680007316515_0003 (state: ACCEPTED)
   23/03/28 13:08:59 INFO Client: Application report for 
application_1680007316515_0003 (state: ACCEPTED)
   23/03/28 13:09:00 INFO Client: Application report for 
application_1680007316515_0003 (state: ACCEPTED)
   23/03/28 13:09:01 INFO Client: Application report for 
application_1680007316515_0003 (state: RUNNING)
   23/03/28 13:09:01 INFO Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: ip-172-32-12-21.ec2.internal
         ApplicationMaster RPC port: 34367
         queue: default
         start time: 1680008934287
         final status: UNDEFINED
         tracking URL: 
http://ip-172-32-147-4.ec2.internal:20888/proxy/application_1680007316515_0003/
         user: hadoop
   23/03/28 13:09:02 INFO Client: Application report for 
application_1680007316515_0003 (state: RUNNING)
   23/03/28 13:09:03 INFO Client: Application report for 
application_1680007316515_0003 (state: RUNNING)
   23/03/28 13:09:04 INFO Client: Application report for 
application_1680007316515_0003 (state: ACCEPTED)
   23/03/28 13:09:04 INFO Client: 
         client token: N/A
         diagnostics: AM container is launched, waiting for AM container to 
Register with RM
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1680008934287
         final status: UNDEFINED
         tracking URL: 
http://ip-172-32-147-4.ec2.internal:20888/proxy/application_1680007316515_0003/
         user: hadoop
   23/03/28 13:09:05 INFO Client: Application report for 
application_1680007316515_0003 (state: ACCEPTED)
   23/03/28 13:09:06 INFO Client: Application report for 
application_1680007316515_0003 (state: ACCEPTED)
   23/03/28 13:09:07 INFO Client: Application report for 
application_1680007316515_0003 (state: ACCEPTED)
   23/03/28 13:09:08 INFO Client: Application report for 
application_1680007316515_0003 (state: ACCEPTED)
   23/03/28 13:09:09 INFO Client: Application report for 
application_1680007316515_0003 (state: RUNNING)
   23/03/28 13:09:09 INFO Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: ip-172-32-12-21.ec2.internal
         ApplicationMaster RPC port: 42741
         queue: default
         start time: 1680008934287
         final status: UNDEFINED
         tracking URL: 
http://ip-172-32-147-4.ec2.internal:20888/proxy/application_1680007316515_0003/
         user: hadoop
   23/03/28 13:09:10 INFO Client: Application report for 
application_1680007316515_0003 (state: RUNNING)
   23/03/28 13:09:11 INFO Client: Application report for 
application_1680007316515_0003 (state: FINISHED)
   23/03/28 13:09:11 INFO Client: 
         client token: N/A
         diagnostics: User class threw exception: java.io.IOException: Could 
not load key generator class org.apache.hudi.keygen.SimpleKeyGenerator
        at 
org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory.createKeyGenerator(HoodieSparkKeyGeneratorFactory.java:74)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.<init>(DeltaSync.java:235)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:675)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:146)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:119)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:571)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:742)
   Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate 
class org.apache.hudi.keygen.SimpleKeyGenerator
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:91)
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:118)
        at 
org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory.createKeyGenerator(HoodieSparkKeyGeneratorFactory.java:72)
        ... 10 more
   Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89)
        ... 12 more
   Caused by: java.lang.IllegalArgumentException: Property 
hoodie.datasource.write.partitionpath.field not found
        at 
org.apache.hudi.common.config.TypedProperties.checkKey(TypedProperties.java:67)
        at 
org.apache.hudi.common.config.TypedProperties.getString(TypedProperties.java:72)
        at 
org.apache.hudi.keygen.SimpleKeyGenerator.<init>(SimpleKeyGenerator.java:41)
        ... 17 more
   
         ApplicationMaster host: ip-172-32-12-21.ec2.internal
         ApplicationMaster RPC port: 42741
         queue: default
         start time: 1680008934287
         final status: FAILED
         tracking URL: 
http://ip-172-32-147-4.ec2.internal:20888/proxy/application_1680007316515_0003/
         user: hadoop
   23/03/28 13:09:11 ERROR Client: Application diagnostics message: User class 
threw exception: java.io.IOException: Could not load key generator class 
org.apache.hudi.keygen.SimpleKeyGenerator
        at 
org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory.createKeyGenerator(HoodieSparkKeyGeneratorFactory.java:74)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.<init>(DeltaSync.java:235)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:675)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:146)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:119)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:571)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:742)
   Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate 
class org.apache.hudi.keygen.SimpleKeyGenerator
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:91)
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:118)
        at 
org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory.createKeyGenerator(HoodieSparkKeyGeneratorFactory.java:72)
        ... 10 more
   Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89)
        ... 12 more
   Caused by: java.lang.IllegalArgumentException: Property 
hoodie.datasource.write.partitionpath.field not found
        at 
org.apache.hudi.common.config.TypedProperties.checkKey(TypedProperties.java:67)
        at 
org.apache.hudi.common.config.TypedProperties.getString(TypedProperties.java:72)
        at 
org.apache.hudi.keygen.SimpleKeyGenerator.<init>(SimpleKeyGenerator.java:41)
        ... 17 more
   
   Exception in thread "main" org.apache.spark.SparkException: Application 
application_1680007316515_0003 finished with failed status
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1354)
        at 
org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1776)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1006)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1095)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1104)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   23/03/28 13:09:11 INFO ShutdownHookManager: Shutdown hook called
   23/03/28 13:09:11 INFO ShutdownHookManager: Deleting directory 
/mnt/tmp/spark-dbd4de25-f7b1-4875-b7fd-c599028ae4e0
   23/03/28 13:09:11 INFO ShutdownHookManager: Deleting directory 
/mnt/tmp/spark-7f0fabb5-07de-43c3-8a26-a2325d5be63a
   Command exiting with ret '1'
   ```
   
   * Any advice | Feedback and pointing out what i am doing wrong would be 
great 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to