[
https://issues.apache.org/jira/browse/HUDI-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Danny Chen updated HUDI-9267:
-----------------------------
Description:
Newer commit always has higher priority for merging comparison, but how about
the updates for one key in the same commit, specifically when there are
multiple updates for one key in the same log data block, which payload should
we choose then?
For streaming system, we should always keep the latest incoming record to have
kind of row-level override, while for batch, it may not care about the sequence
inside one commit though.
Let's keep the merging sequence in line with the old file slice reader code
path.
> Fix the file group reader log file read sequence
> ------------------------------------------------
>
> Key: HUDI-9267
> URL: https://issues.apache.org/jira/browse/HUDI-9267
> Project: Apache Hudi
> Issue Type: New Feature
> Components: reader-core
> Reporter: Danny Chen
> Assignee: Danny Chen
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.0.2
>
>
> Newer commit always has higher priority for merging comparison, but how about
> the updates for one key in the same commit, specifically when there are
> multiple updates for one key in the same log data block, which payload should
> we choose then?
> For streaming system, we should always keep the latest incoming record to
> have kind of row-level override, while for batch, it may not care about the
> sequence inside one commit though.
> Let's keep the merging sequence in line with the old file slice reader code
> path.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)