kazdy commented on issue #6869: URL: https://github.com/apache/hudi/issues/6869#issuecomment-1270599382
Hi, record key works like a PK on a table (unique, non-nullable field). In your case, you end up with two records with different record key and that's expected. Precombine and upserts are supposed to maintain the uniqueness of recordKey. assume you use only delivery field as record key to make it easier so if you have record with delivery:3000 hudi will do insert (if record with same record key does not exists in the table), if record with delivery:2000 and it already exists in the table then update precombine works before write, incoming batch of data is deduplicated based on record key and precombine field so if in incoming batch you have two records with the same record key, then one with greater precombine field value will be passed to write operation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
