[ 
https://issues.apache.org/jira/browse/HUDI-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17711890#comment-17711890
 ] 

Sagar Sumit commented on HUDI-5997:
-----------------------------------

That's a good question. Whenever you start the deltastreamer for the 
incremental source using {{spark-submit}} command, you can provide 
{{--schemaprovider-class}} as 
{{org.apache.hudi.utilities.schema.FilebasedSchemaProvider}} and additionally 
pass the source schema file that you want to enforce as {{--hoodie-conf 
hoodie.deltastreamer.schemaprovider.source.schema.file=/path/to/source/schema.avsc}}
 to the same spark-submit command. So the full command will look something like:
{code:java}
spark-submit \
--jars "<hudi-utilities-bundle_jar>,<other-jars-that-you-add-in-classpath>" \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
<hudi-utilities-bundle_jar> \
--table-type COPY_ON_WRITE \
--source-ordering-field <ordering key from source data> \
--target-base-path s3://bucket_name/path/for/s3_hudi_table \
--target-table s3_hudi_table  \
--continuous \
--min-sync-interval-seconds 10 \
...
...
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider 
\
--hoodie-conf 
hoodie.deltastreamer.schemaprovider.source.schema.file=/path/to/source/schema.avsc
 \
--hoodie-conf hoodie.datasource.write.recordkey.field="<record key from source 
data>" \
...
...
--source-class org.apache.hudi.utilities.sources.S3EventsHoodieIncrSource \
--hoodie-conf 
hoodie.deltastreamer.source.hoodieincr.path=s3://bucket_name/path/for/s3_meta_table
 \
--hoodie-conf 
hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt=true {code}

> Support DFS Schema Provider with S3/GCS EventsHoodieIncrSource
> --------------------------------------------------------------
>
>                 Key: HUDI-5997
>                 URL: https://issues.apache.org/jira/browse/HUDI-5997
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: deltastreamer
>            Reporter: Sagar Sumit
>            Assignee: Léo Biscassi
>            Priority: Major
>             Fix For: 0.14.0
>
>
> See for more details



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to