SureshK-T2S edited a comment on issue #2406:
URL: https://github.com/apache/hudi/issues/2406#issuecomment-774483216


   Hello, thank you guys for giving me time with this. I have since had an 
issue with MultiTableDeltaStreamer, in particular getting it to work with 
ParquetDFS Data Source. Getting an issue due to the SchemaProvider or lack of 
one.
   
   Command:
   ```
   spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer \
   --packages 
org.apache.hudi:hudi-spark-bundle_2.11:0.6.0,org.apache.spark:spark-avro_2.11:2.4.4\
     --master yarn --deploy-mode client \
   /usr/lib/hudi/hudi-utilities-bundle.jar --table-type COPY_ON_WRITE \
    --props s3:///temp/config/s3-source.properties \
     --config-folder s3:///temp/hudi-ingestion-config/\
     --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
    --continuous --source-ordering-field updated_at \
     --base-path-prefix s3://hudi-data-lake --target-table dummy_table --op 
UPSERT
   ```
   
   S3 properties:
   ```
   hoodie.deltastreamer.ingestion.tablesToBeIngested=db.table1,db.table2
   
hoodie.deltastreamer.ingestion.db.table1.configFile=s3://hudi-data-lake/configs/db/table1.properties
   
hoodie.deltastreamer.ingestion.db.table2.configFile=s3://hudi-data-lake/configs/db/table2.properties
   ```
   
   Table1 properties:
   ```
   
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
   hoodie.deltastreamer.source.dfs.root=s3://root_folder_1
   hoodie.datasource.write.recordkey.field=id
   hoodie.datasource.write.partitionpath.field=year,month,day
   ```
   
   Table2 properties:
   ```
   
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
   hoodie.deltastreamer.source.dfs.root=s3://root_folder_2
   hoodie.datasource.write.recordkey.field=id
   hoodie.datasource.write.partitionpath.field=year,month,day
   ```
   
   Error:
   ```
   Exception in thread "main" java.lang.NullPointerException
        at 
org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.populateSchemaProviderProps(HoodieMultiTableDeltaStreamer.java:148)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.populateTableExecutionContextList(HoodieMultiTableDeltaStreamer.java:128)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.<init>(HoodieMultiTableDeltaStreamer.java:78)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.main(HoodieMultiTableDeltaStreamer.java:201)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   ```
   
   Reaching final steps of my setup, really hoping to be able to get this 
resolved and go live soon!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to