rajgowtham24 opened a new issue #1835:
URL: https://github.com/apache/hudi/issues/1835
Hi all,
I'm new to Hudi and looking to leverage Delta Streamer for JSON sources that
is available in my s3 bucket.
Below is the code snippet that i'm using to execute the same
Source File(Json Format)
{"empno":"8006","ename":"stuart","job":"salesman","hiredate":"2020-01-01
00:00:00"}
Code
spark-submit --class
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer `ls
/usr/lib/hudi/hudi-utilities-bundle_2.11-0.5.2-incubating.jar`
--table-type COPY_ON_WRITE
--source-class org.apache.hudi.utilities.sources.JsonDFSSource
--target-base-path s3://gowtham_km/hudi/target> --target-table emp
--hoodie-conf
hoodie.datasource.write.recordkey.field=empno,hoodie.deltastreamer.source.dfs.root=s3://gowtham_km/hudi/source>
--transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer
--payload-class org.apache.hudi.payload.AWSDmsAvroPayload
--props
file:/usr/lib/hudi/hudi_utilities/delta-streamer-config/dfs-source.properties
--schemaprovider-class org.apache.hudi.utilities.schema.SchemaProvider
Error
Exception in thread "main" java.io.IOException: Could not load schema
provider class org.apache.hudi.utilities.schema.SchemaProvider
at
org.apache.hudi.utilities.UtilHelpers.createSchemaProvider(UtilHelpers.java:101)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:364)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:95)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:89)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:294)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate
class
at
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:80)
at
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89)
at
org.apache.hudi.utilities.UtilHelpers.createSchemaProvider(UtilHelpers.java:99)
... 16 more
Caused by: java.lang.NoSuchMethodException:
org.apache.hudi.utilities.schema.SchemaProvider.<init>(org.apache.hudi.common.util.TypedProperties,
org.apache.spark.api.java.JavaSparkContext)
at java.lang.Class.getConstructor0(Class.java:3110)
at java.lang.Class.getConstructor(Class.java:1853)
at
org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:78)
... 18 more
20/07/15 15:51:36 INFO ShutdownHookManager: Shutdown hook called
20/07/15 15:51:36 INFO ShutdownHookManager: Deleting directory
/mnt/tmp/spark-51f5cf1d-db65-4c2b-853e-8e64c0666648
20/07/15 15:51:36 INFO ShutdownHookManager: Deleting directory
/mnt/tmp/spark-c2584053-3620-48dc-9380-43318af38392
Expectation
To start with learning would like to load the json file into target table
and then later will add continuous option to load the new files into target
table automatically.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]