fanaticjo commented on issue #2975: URL: https://github.com/apache/hudi/issues/2975#issuecomment-850544149
Hello @calleo i am trying to make this as a generic release in version 0.9.0 , till the time being you can clone this git hub repo - https://github.com/fanaticjo/HudiJavaCustomUpsert/tree/master/src/main/java/com create a jar out of it and load this as a dependent jar with hudi and avro jar for your pysaprk . In your pyspark code you can add this 2 options "hoodie.update.keys": "admission_date,name", #columns you want to update "hoodie.datasource.write.payload.class": "com.hudiUpsert.hudiCustomUpsert", #custom jar Sample piece which is working for me hudi_update_options_with_key = { 'hoodie.table.name': "test", 'hoodie.datasource.write.recordkey.field': 'id', 'hoodie.datasource.write.partitionpath.field': 'dept', 'hoodie.datasource.write.table.name': "test", "hoodie.index.type": "SIMPLE", "hoodie.update.keys": "admission_date,name", "hoodie.datasource.write.payload.class": "com.hudiUpsert.hudiCustomUpsert", 'hoodie.datasource.write.operation': 'upsert', 'hoodie.datasource.write.precombine.field': 'date', 'hoodie.upsert.shuffle.parallelism': 2, 'hoodie.insert.shuffle.parallelism': 2 } df.write.format("org.apache.hudi").options(**hudi_update_options_with_key).mode("append").save( "/Users/biswajit/PycharmProjects/hudicustomupsert/output/") -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
