SureshK-T2S edited a comment on issue #2406:
URL: https://github.com/apache/hudi/issues/2406#issuecomment-774483216
Hello, thank you guys for giving me time with this. I have since had an
issue with MultiTableDeltaStreamer, in particular getting it to work with
ParquetDFS Data Source. Getting an issue due to the SchemaProvider or lack of
one.
Command:
```
spark-submit --class
org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer \
--packages
org.apache.hudi:hudi-spark-bundle_2.11:0.6.0,org.apache.spark:spark-avro_2.11:2.4.4\
--master yarn --deploy-mode client \
/usr/lib/hudi/hudi-utilities-bundle.jar --table-type COPY_ON_WRITE \
--props s3:///temp/config/s3-source.properties \
--config-folder s3:///temp/hudi-ingestion-config/\
--source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
--continuous --source-ordering-field updated_at \
--base-path-prefix s3://hudi-data-lake --target-table dummy_table --op
UPSERT
```
S3 properties:
```
hoodie.deltastreamer.ingestion.tablesToBeIngested=db.table1,db.table2
hoodie.deltastreamer.ingestion.db.table1.configFile=s3://hudi-data-lake/configs/db/table1.properties
hoodie.deltastreamer.ingestion.db.table2.configFile=s3://hudi-data-lake/configs/db/table2.properties
```
Table1 properties:
```
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
hoodie.deltastreamer.source.dfs.root=s3://root_folder_1
hoodie.datasource.write.recordkey.field=id
hoodie.datasource.write.partitionpath.field=year,month,day
```
Table2 properties:
```
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
hoodie.deltastreamer.source.dfs.root=s3://root_folder_2
hoodie.datasource.write.recordkey.field=id
hoodie.datasource.write.partitionpath.field=year,month,day
```
Error:
```
Exception in thread "main" java.lang.NullPointerException
at
org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.populateSchemaProviderProps(HoodieMultiTableDeltaStreamer.java:148)
at
org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.populateTableExecutionContextList(HoodieMultiTableDeltaStreamer.java:128)
at
org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.<init>(HoodieMultiTableDeltaStreamer.java:78)
at
org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.main(HoodieMultiTableDeltaStreamer.java:201)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
```
Reaching final steps of my setup, really hoping to be able to get this
resolved and go live soon!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]