Kavin88 edited a comment on issue #3831:
URL: https://github.com/apache/hudi/issues/3831#issuecomment-948232383
@xushiyan 1. As of now, I am directly doing the spark submit on the EMR
cluster for deltastreamer run. Want to understand if deltastreamer can be used
same as hudi datasource writer. Params we would pass in datasource writer in
pyspark is given below. I am not able to get how to pass the deltastreamers
params in python/spark code or through livy submit. Not able to find how to
pass --continuous, source class name , source ordering field ,etc in below
hudiOptions. Is this viable ?
hudiOptions = {
"hoodie.table.name": "hudi_test",
"hoodie.datasource.write.recordkey.field": "id",
"hoodie.datasource.write.precombine.field": "last_update_time",
"hoodie.upsert.shuffle.parallelism": 1,
"hoodie.insert.shuffle.parallelism": 1,
'hoodie.datasource.write.storage.type': 'MERGE_ON_READ'
}
inputdf.write.format('org.apache.hudi').option('hoodie.datasource.write.operation',
'insert').options(**hudiOptions).mode('overwrite').save('storagepath')
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]