[
https://issues.apache.org/jira/browse/HUDI-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Guo updated HUDI-6798:
----------------------------
Story Points: 10 (was: 3)
> Implement event-time-based merging mode in FileGroupReader
> ----------------------------------------------------------
>
> Key: HUDI-6798
> URL: https://issues.apache.org/jira/browse/HUDI-6798
> Project: Apache Hudi
> Issue Type: New Feature
> Reporter: Ethan Guo
> Assignee: Ethan Guo
> Priority: Blocker
> Labels: hudi-1.0.0-beta2, pull-request-available
> Fix For: 1.0.0
>
>
> To achieve this, we should add a new table config
> {{hoodie.record.merge.mode}} to control the record merging mode and behavior
> in the new file group reader ({{{}HoodieFileGroupReader{}}}) and implements
> event-time ordering in it. The table config {{hoodie.record.merge.mode}} is
> going to be the single config that determines how the record merging happens
> in release 1.0 and beyond.
>
> Three merging modes to define:
> * {{{}OVERWRITE_WITH_LATEST{}}}: using transaction time to merge records,
> i.e., the record from later transaction overwrites the earlier record with
> the same key. This corresponds to the behavior of existing payload class
> {{{}OverwriteWithLatestAvroPayload{}}}.
> * {{{}EVENT_TIME_ORDERING{}}}: using event time as the ordering to merge
> records, i.e., the record with the larger event time overwrites the record
> with the smaller event time on the same key, regardless of transaction time.
> The event time or preCombine field needs to be specified by the user. This
> corresponds to the behavior of existing payload class
> {{{}DefaultHoodieRecordPayload{}}}.
> * {{{}CUSTOM{}}}: using custom merging logic specified by the user. When a
> user specifies a custom record merger strategy or payload class with Avro
> record merger, this is going to be specified so the record merging follows
> user-defined logic as before.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)