[
https://issues.apache.org/jira/browse/HUDI-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Guo updated HUDI-5176:
----------------------------
Labels: (was: incremen)
> Incremental source may miss commits if there are inflight commits before
> completed commits
> ------------------------------------------------------------------------------------------
>
> Key: HUDI-5176
> URL: https://issues.apache.org/jira/browse/HUDI-5176
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Ethan Guo
> Assignee: Ethan Guo
> Priority: Blocker
> Fix For: 0.12.2
>
>
> Consider the following scenario of concurrent writers. Writer 1 starts a
> commit at t1 and later writer 2 starts another commit at t2 (t2 > t1). Commit
> t2 finishes earlier than t1.
> {code:java}
> ---------------------------------------------------------> t
> instant t1 |------------------------------| (writer 1)
> instant t2 |--------------| (writer 2) {code}
> This leaves an inflight commit (t1) before a completed commit (t2) on the
> Hudi timeline. Given that the incremental pull uses only completed commits
> to determine the start and end instants for incremental query and advance the
> checkpoint, the data for the inflight commits may never be pulled from the
> incremental source.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)