Kavin88 opened a new issue #3831:
URL: https://github.com/apache/hudi/issues/3831
spark-submit --packages
org.apache.hudi:hudi-utilities-bundle_2.11:0.5.3,org.apache.spark:spark-avro_2.11:2.4.4
\
--master yarn \
--deploy-mode cluster \
--conf spark.sql.shuffle.partitions=100 \
--driver-class-path $HADOOP_CONF_DIR \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
--table-type MERGE_ON_READ \
--source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
--source-ordering-field tst \
--target-base-path /user/hive/warehouse/stock_ticks_mor \
--target-table test \
--props /var/demo/config/kafka-source.properties
1. Is deltastreamer can be used only as a CLI utility ?
2. if it can be integrated in pyspark code as like datasource writer, how to
pass deltastreamer utility specific parameters --props, --source-class and
--continuous in hudi config options ?
3. Similary is it possible to pass above parameters(2nd point) through Livy
for spark submission?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]