Team,


Regarding PRECOMBINE_FIELD_OPT_KEY,  I see the below description in the
documentation.



PRECOMBINE_FIELD_OPT_KEY

Property: hoodie.datasource.write.precombine.field, Default: ts
Field used in preCombining before actual write. When two records have the
same key value, we will pick the one with the largest value for the
precombine field, determined by Object.compareTo(..)



It says same key values. What does key value denote here ? Is it a record
key?  I have to process small batch record frequently , which could have
the records for same record key. Say example



*Batch 1:*



*Request_id | Operation_name | field 1.  | field 2  | transaction_timestamp*

123               |  I                             | Value 0  | value 0 |
Fri May 15 20:23:26 GMT 2020





*Batch 2 :*



*Request_id | Operation_name | field 1.  | field 2  | transaction_timestamp*

123               |  U                             | Value 1  | value 0 |
Fri May 15 20:23:26 GMT 2020

*Request_id | Operation_name | field 1.  | field 2  | transaction_timestamp*

123               |  U                             | Value 0  | value 2 |
Fri May 15 20:23:26 GMT 2020



 Here Batch 1 got processed and then after 10 min, I need to process Batch
2 using Hudi.Basicaclly I don’t want to do PRECOMBINE_FIELD_OPT_KEY
operation and logcailly handles batch 2 updates. Because my batch 2 will
have 2 updates and I need to logically combine them and upsert HDFS.



PRECOMBINE_FIELD_OPT_KEY seems to be mandateone and it throws error, if I
don’t have ts column in my table and also don't want Hudi to do
precombine as it could eliminate some of my genuine records.  What is
your suggestion on handling this scenario?



Thanks,

Selva

Reply via email to