[
https://issues.apache.org/jira/browse/HUDI-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506675#comment-17506675
]
sivabalan narayanan commented on HUDI-2751:
-------------------------------------------
[~danny0405] : We have already added preserve commit metadata for compaction
[https://github.com/apache/hudi/pull/4811]
does that satisfy this jira? Or is there anything more.
> To avoid the duplicates for streaming read MOR table
> ----------------------------------------------------
>
> Key: HUDI-2751
> URL: https://issues.apache.org/jira/browse/HUDI-2751
> Project: Apache Hudi
> Issue Type: Task
> Components: Common Core
> Reporter: Danny Chen
> Assignee: sivabalan narayanan
> Priority: Blocker
> Fix For: 0.11.0
>
>
> Imagine there are commits on the timeline:
> inflight compaction complete compaction
> | |
> {code:java}
> -----instant 99 - instant 100 ----- 101 — 102 ------ instant 100 ----------
> first read ->| second read ->|
> – range 1 ----| ----------------------range 2 -------------------|
> {code}
> instant 99, 101, 102 are successful non-compaction delta commits;
> instant 100 is compaction instant,
> the first inc read consumes to instant 99 and the second read consumes from
> instant 100 to instant 102, the second read would consumes the commit files
> of instant 100 which has already been consumed before.
> The duplicate reading happens when this condition triggers: a compaction
> instant schedules then completes in *one* consume range.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)