[
https://issues.apache.org/jira/browse/HUDI-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Guo updated HUDI-6798:
----------------------------
Description:
To achieve this, we should add a new table config {{hoodie.record.merge.mode}}
to control the record merging mode and behavior in the new file group reader
({{{}HoodieFileGroupReader{}}}) and implements event-time ordering in it. The
table config {{hoodie.record.merge.mode}} is going to be the single config that
determines how the record merging happens in release 1.0 and beyond.
Three merging modes to define:
* {{{}OVERWRITE_WITH_LATEST{}}}: using transaction time to merge records,
i.e., the record from later transaction overwrites the earlier record with the
same key. This corresponds to the behavior of existing payload class
{{{}OverwriteWithLatestAvroPayload{}}}.
* {{{}EVENT_TIME_ORDERING{}}}: using event time as the ordering to merge
records, i.e., the record with the larger event time overwrites the record with
the smaller event time on the same key, regardless of transaction time. The
event time or preCombine field needs to be specified by the user. This
corresponds to the behavior of existing payload class
{{{}DefaultHoodieRecordPayload{}}}.
* {{{}CUSTOM{}}}: using custom merging logic specified by the user. When a
user specifies a custom record merger strategy or payload class with Avro
record merger, this is going to be specified so the record merging follows
user-defined logic as before.
was:To achieve this, we should add a new table config
{{hoodie.record.merge.mode}} to control the record merging mode and behavior in
the new file group reader ({{{}HoodieFileGroupReader{}}}) and implements
event-time ordering in it. The table config {{hoodie.record.merge.mode}} is
going to be the single config that determines how the record merging happens in
release 1.0 and beyond.
> Implement event-time-based merging mode in FileGroupReader
> ----------------------------------------------------------
>
> Key: HUDI-6798
> URL: https://issues.apache.org/jira/browse/HUDI-6798
> Project: Apache Hudi
> Issue Type: New Feature
> Reporter: Ethan Guo
> Assignee: Ethan Guo
> Priority: Blocker
> Labels: hudi-1.0.0-beta2, pull-request-available
> Fix For: 1.0.0
>
>
> To achieve this, we should add a new table config
> {{hoodie.record.merge.mode}} to control the record merging mode and behavior
> in the new file group reader ({{{}HoodieFileGroupReader{}}}) and implements
> event-time ordering in it. The table config {{hoodie.record.merge.mode}} is
> going to be the single config that determines how the record merging happens
> in release 1.0 and beyond.
>
> Three merging modes to define:
> * {{{}OVERWRITE_WITH_LATEST{}}}: using transaction time to merge records,
> i.e., the record from later transaction overwrites the earlier record with
> the same key. This corresponds to the behavior of existing payload class
> {{{}OverwriteWithLatestAvroPayload{}}}.
> * {{{}EVENT_TIME_ORDERING{}}}: using event time as the ordering to merge
> records, i.e., the record with the larger event time overwrites the record
> with the smaller event time on the same key, regardless of transaction time.
> The event time or preCombine field needs to be specified by the user. This
> corresponds to the behavior of existing payload class
> {{{}DefaultHoodieRecordPayload{}}}.
> * {{{}CUSTOM{}}}: using custom merging logic specified by the user. When a
> user specifies a custom record merger strategy or payload class with Avro
> record merger, this is going to be specified so the record merging follows
> user-defined logic as before.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)