Aruun opened a new issue #4701:
URL: https://github.com/apache/hudi/issues/4701
Command used:
spark-submit --jars
/usr/lib/hudi/hudi-utilities-bundle_2.12-0.8.0-amzn-0.jar,/usr/lib/spark/external/lib/spark-avro.jar
--deploy-mode cluster --master yarn --class
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
/usr/lib/hudi/hudi-utilities-bundle_2.12-0.8.0-amzn-0.jar --table-type
COPY_ON_WRITE --source-ordering-field registration_dttm --source-class
org.apache.hudi.utilities.sources.ParquetDFSSource --target-base-path
s3://<bucketname><path> --target-table hudi_test --transformer-class
org.apache.hudi.utilities.transform.AWSDmsTransformer,org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
--payload-class org.apache.hudi.payload.AWSDmsAvroPayload --hoodie-conf
hoodie.deltastreamer.source.dfs.root=s3://<bucket><path>,hoodie.datasource.write.recordkey.field=<field>,hoodie.datasource.write.partitionpath.field=<field>
Steps to reproduce the behavior:
1. Running the above command on the EMR 6.4 with Spark 3.1.2, hive 3.1.2
2. Rearranged the commands structure but no use, same issue
**Expected behavior**
Load data into hudi dataset using deltastreamer
**Environment Description**
* Hudi version :
0.8.0
* Spark version :
3.1.2
* Hive version :
3.1.2
* Hadoop version :
* Storage (HDFS/S3/GCS..) :
S3
* Running on Docker? (yes/no) :
no
**Stacktrace**
```22/01/27 16:56:53 ERROR Client: Application diagnostics message: User
class threw exception: java.io.IOException: Could not load key generator class
org.apache.hudi.keygen.SimpleKeyGenerator
at
org.apache.hudi.DataSourceUtils.createKeyGenerator(DataSourceUtils.java:99)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.<init>(DeltaSync.java:209)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:562)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:140)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:103)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:472)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:735)
Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate
class
at
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89)
at
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:98)
at
org.apache.hudi.DataSourceUtils.createKeyGenerator(DataSourceUtils.java:97)
... 10 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:87)
... 12 more
Caused by: java.lang.IllegalArgumentException: Property
hoodie.datasource.write.recordkey.field not found
at
org.apache.hudi.common.config.TypedProperties.checkKey(TypedProperties.java:43)
at
org.apache.hudi.common.config.TypedProperties.getString(TypedProperties.java:56)
at
org.apache.hudi.keygen.SimpleKeyGenerator.<init>(SimpleKeyGenerator.java:36)
... 17 more
Exception in thread "main" org.apache.spark.SparkException: Application
application_1643297796042_0012 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1253)
at
org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1645)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:959)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1047)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1056)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]