[GitHub] [hudi] Kavin88 opened a new issue #3831: Deltastreamer through Pyspark/livy

GitBox Wed, 20 Oct 2021 02:55:03 -0700


Kavin88 opened a new issue #3831:
URL: https://github.com/apache/hudi/issues/3831



   spark-submit --packages 
org.apache.hudi:hudi-utilities-bundle_2.11:0.5.3,org.apache.spark:spark-avro_2.11:2.4.4
 \
    --master yarn \
    --deploy-mode cluster \
    --conf spark.sql.shuffle.partitions=100 \
    --driver-class-path $HADOOP_CONF_DIR \
    --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
    --table-type MERGE_ON_READ \
    --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
    --source-ordering-field tst  \
    --target-base-path /user/hive/warehouse/stock_ticks_mor \
    --target-table test \
    --props /var/demo/config/kafka-source.properties 
   
   1. Is deltastreamer can be used only as a CLI utility ?
   2. if it can be integrated in pyspark code as like datasource writer, how to 
pass deltastreamer utility specific parameters --props, --source-class and 
--continuous in hudi config options  ?
   3. Similary is it possible to pass above parameters(2nd point) through Livy 
for spark submission?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] Kavin88 opened a new issue #3831: Deltastreamer through Pyspark/livy

Reply via email to