Selva,
     Can you try setting this property("hoodie.combine.before.upsert") to
false. By default this is set to true and hence you see the error.

Wrt your question on key values,
if non-partitioned dataset (global index)    : records having same record
key will be combined if precombine is enabled.
for partitioned data set( non global index) : records having same HoodieKey
will be combined if precombine is enabled.


On Sat, May 16, 2020 at 5:37 PM selvaraj periyasamy <
[email protected]> wrote:

> Team,
>
>
>
> Regarding PRECOMBINE_FIELD_OPT_KEY,  I see the below description in the
> documentation.
>
>
>
> PRECOMBINE_FIELD_OPT_KEY
>
> Property: hoodie.datasource.write.precombine.field, Default: ts
> Field used in preCombining before actual write. When two records have the
> same key value, we will pick the one with the largest value for the
> precombine field, determined by Object.compareTo(..)
>
>
>
> It says same key values. What does key value denote here ? Is it a record
> key?  I have to process small batch record frequently , which could have
> the records for same record key. Say example
>
>
>
> *Batch 1:*
>
>
>
> *Request_id | Operation_name | field 1.  | field 2  |
> transaction_timestamp*
>
> 123               |  I                             | Value 0  | value 0 |
> Fri May 15 20:23:26 GMT 2020
>
>
>
>
>
> *Batch 2 :*
>
>
>
> *Request_id | Operation_name | field 1.  | field 2  |
> transaction_timestamp*
>
> 123               |  U                             | Value 1  | value 0 |
> Fri May 15 20:23:26 GMT 2020
>
> *Request_id | Operation_name | field 1.  | field 2  |
> transaction_timestamp*
>
> 123               |  U                             | Value 0  | value 2 |
> Fri May 15 20:23:26 GMT 2020
>
>
>
>  Here Batch 1 got processed and then after 10 min, I need to process Batch
> 2 using Hudi.Basicaclly I don’t want to do PRECOMBINE_FIELD_OPT_KEY
> operation and logcailly handles batch 2 updates. Because my batch 2 will
> have 2 updates and I need to logically combine them and upsert HDFS.
>
>
>
> PRECOMBINE_FIELD_OPT_KEY seems to be mandateone and it throws error, if I
> don’t have ts column in my table and also don't want Hudi to do
> precombine as it could eliminate some of my genuine records.  What is
> your suggestion on handling this scenario?
>
>
>
> Thanks,
>
> Selva
>


-- 
Regards,
-Sivabalan

Reply via email to