[ 
https://issues.apache.org/jira/browse/HUDI-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-9267:
-----------------------------
    Description: 
Newer commit always has higher priority for merging comparison, but how about 
the updates for one key in the same commit, specifically when there are 
multiple updates for one key in the same log data block, which payload should 
we choose then?

For streaming system, we should always keep the latest incoming record to have 
kind of row-level override, while for batch, it may not care about the sequence 
inside one commit though.

Let's keep the merging sequence in line with the old file slice reader code 
path.

> Fix the file group reader log file read sequence
> ------------------------------------------------
>
>                 Key: HUDI-9267
>                 URL: https://issues.apache.org/jira/browse/HUDI-9267
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: reader-core
>            Reporter: Danny Chen
>            Assignee: Danny Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.0.2
>
>
> Newer commit always has higher priority for merging comparison, but how about 
> the updates for one key in the same commit, specifically when there are 
> multiple updates for one key in the same log data block, which payload should 
> we choose then?
> For streaming system, we should always keep the latest incoming record to 
> have kind of row-level override, while for batch, it may not care about the 
> sequence inside one commit though.
> Let's keep the merging sequence in line with the old file slice reader code 
> path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to