fanaticjo commented on issue #2975:
URL: https://github.com/apache/hudi/issues/2975#issuecomment-850544149


   Hello @calleo   i am trying to make this as a generic release in version 
0.9.0 , till the time being you can clone this git hub repo - 
https://github.com/fanaticjo/HudiJavaCustomUpsert/tree/master/src/main/java/com 
create a jar out of it and load this as a dependent jar with hudi and avro jar 
for your pysaprk . 
   
   In your pyspark code you can add this 2 options 
   "hoodie.update.keys": "admission_date,name",   #columns you want to update 
   "hoodie.datasource.write.payload.class": "com.hudiUpsert.hudiCustomUpsert",  
#custom jar 
   
   Sample piece which is working for me 
   hudi_update_options_with_key = {
       'hoodie.table.name': "test",
       'hoodie.datasource.write.recordkey.field': 'id',
       'hoodie.datasource.write.partitionpath.field': 'dept',
       'hoodie.datasource.write.table.name': "test",
       "hoodie.index.type": "SIMPLE",
       "hoodie.update.keys": "admission_date,name",
       "hoodie.datasource.write.payload.class": 
"com.hudiUpsert.hudiCustomUpsert",
       'hoodie.datasource.write.operation': 'upsert',
       'hoodie.datasource.write.precombine.field': 'date',
       'hoodie.upsert.shuffle.parallelism': 2,
       'hoodie.insert.shuffle.parallelism': 2
   }
   
   
df.write.format("org.apache.hudi").options(**hudi_update_options_with_key).mode("append").save(
       "/Users/biswajit/PycharmProjects/hudicustomupsert/output/")
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to