xushiyan commented on issue #3831: URL: https://github.com/apache/hudi/issues/3831#issuecomment-948120800
1. deltastreamer is a spark application that supposed to run on a cluster. not sure how it fits into a CLI utility. if you want to just use a CLI command to submit the job as in it just triggers the submission, yea there is nothing stops you doing it. 2. most deltastreamer configs translates to hudi options internally, for e.g., --source-ordering-field matches precombine field option. I'd suggest find the all the needed hudi configs for your application based on deltastreamer's params and create a map of hudi write options from scratch, then pass it to datasource writer. The extra work is you may need to do some orchestration for your datasource writer like schedule it periodically and trigger compaction in a separate process. Not all deltastreamer params match to hudi write options like --continuous is for orchestration mode not writer option. So deltastreamer is at higher level than datasource writing you can't flip them. 3. see 2) Hope this helps. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
