Amar1404 opened a new issue, #8626: URL: https://github.com/apache/hudi/issues/8626
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at [email protected]. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** I am using Hudi DeltaStream with JsonKafkaSource for **SyncOnce** use Case, I am trying to use Infer Schema but this is not working. Since it throwing an error. ``` 23/05/03 08:37:59 ERROR ApplicationMaster: User class threw exception: org.apache.hudi.exception.HoodieException: Please provide a valid schema provider class! org.apache.hudi.exception.HoodieException: Please provide a valid schema provider class! at org.apache.hudi.utilities.sources.InputBatch.getSchemaProvider(InputBatch.java:56) ~[__app__.jar:0.12.1-amzn-0] at org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:111) ~[__app__.jar:0.12.1-amzn-0] at org.apache.hudi.utilities.deltastreamer.DeltaSync.fetchFromSource(DeltaSync.java:426) ~[__app__.jar:0.12.1-amzn-0] at org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:401) ~[__app__.jar:0.12.1-amzn-0] at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:305) ~[__app__.jar:0.12.1-amzn-0] at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:204) ~[__app__.jar:0.12.1-amzn-0] at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) ~[__app__.jar:0.12.1-amzn-0] at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:202) ~[__app__.jar:0.12.1-amzn-0] at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:571) ~[__app__.jar:0.12.1-amzn-0] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_362] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_362] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_362] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_362] at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:742) ~[spark-yarn_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1] ``` **To Reproduce** Steps to reproduce the behavior: 1. Create a KAFKA Topic contain Json Formatted Messages. 2. Use Hudi DeltaStream command with providing The SchemaProviderClass. **Expected behavior** It should able to infer schema from source and write the output to Sink. even there is some transformation happen like selecting few column or flattening of json. **Environment Description** * Hudi version : 0.12.1 * Spark version : 3.3 * Hive version : 3.1.3 * Hadoop version : * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** In SourceFormatDataAdapter Class while using Method **fetchNewDataInRowFormat**. For Json type, when we are checking for InputBatch public SchemaProvider getSchemaProvider() { if (batch.isPresent() && schemaProvider == null) { throw new HoodieException("Please provide a valid schema provider class!"); } return Option.ofNullable(schemaProvider).orElse(new NullSchemaProvider()); } causing the error. even we have option of NullSchemaPro **Stacktrace** ```org.apache.hudi.exception.HoodieException: Please provide a valid schema provider class! at org.apache.hudi.utilities.sources.InputBatch.getSchemaProvider(InputBatch.java:56) ~[__app__.jar:0.12.1-amzn-0] at org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInRowFormat(SourceFormatAdapter.java:111) ~[__app__.jar:0.12.1-amzn-0] at org.apache.hudi.utilities.deltastreamer.DeltaSync.fetchFromSource(DeltaSync.java:426) ~[__app__.jar:0.12.1-amzn-0] at org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:401) ~[__app__.jar:0.12.1-amzn-0] at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:305) ~[__app__.jar:0.12.1-amzn-0] at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:204) ~[__app__.jar:0.12.1-amzn-0] at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) ~[__app__.jar:0.12.1-amzn-0] at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:202) ~[__app__.jar:0.12.1-amzn-0] at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:571) ~[__app__.jar:0.12.1-amzn-0] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_362] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_362] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_362] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_362] at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:742) ~[spark-yarn_2.12-3.3.0-amzn-1.jar:3.3.0-amzn-1] 23/05/03 08:37:59 INFO DeltaSync: Shutting down embedded timeline server``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
