Anton,

Yes, you can achieve this with Hudi. Hudi uses a HoodieRecordPayload
implementation to be able to merge old and new records. You can define a
source ordering field (here "sort_key") to govern which record should be
chosen as the latest one. The DefaultHoodieRecordPayload supports this ->
https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java

You just need to set the correct source ordering field name, take a look at
an example here ->
https://github.com/apache/hudi/blob/master/hudi-common/src/test/java/org/apache/hudi/common/model/TestDefaultHoodieRecordPayload.java#L44

Please create a GH issue or post in the general slack channel for further
collaboration if needed.

Thanks,
Nishith

On Sat, Jan 30, 2021 at 6:59 PM Anton Zuyeu <[email protected]> wrote:

> Hi Hudi team,
>
> We are replicating database table by reading table change logs and applying
> them to Hudi table, we would like to implement our pipeline so it can
> process records out of order. Pretty much we want to introduce column
> "sort_key" and only update existing records in the hudi table if a new
> record's sort_key is greater than the sort_key value of an existing record.
> Initially we thought that we just need to assign to
> hoodie.datasource.write.precombine.field
> parameter value= "sort_key" , however it looks like it is not the case as
> hoodie.datasource.write.precombine.field   comes to play only when pre
> combining records prior to writing. Is there a way to implement our use
> case using hudi's primitives ?
>
> Thank you,
> Anton
>

Reply via email to