[
https://issues.apache.org/jira/browse/HUDI-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Guo updated HUDI-5176:
----------------------------
Description:
Consider the following scenario of concurrent writers. Writer 1 starts a commit
at t1 and later writer 2 starts another commit at t2 (t2 > t1). Commit t2
finishes earlier than t1.
{code:java}
---------------------------------------------------------> t
instant t1 |------------------------------| (writer 1)
instant t2 |--------------| (writer 2) {code}
This leaves an inflight commit (t1) before a completed commit (t2) on the Hudi
timeline. Given that the incremental pull uses only completed commits to
determine the start and end instants for incremental query and advance the
checkpoint, the data for the inflight commits may never be pulled from the
incremental source.
> Incremental source may miss commits if there are inflight commits before
> completed commits
> ------------------------------------------------------------------------------------------
>
> Key: HUDI-5176
> URL: https://issues.apache.org/jira/browse/HUDI-5176
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Ethan Guo
> Priority: Major
>
> Consider the following scenario of concurrent writers. Writer 1 starts a
> commit at t1 and later writer 2 starts another commit at t2 (t2 > t1). Commit
> t2 finishes earlier than t1.
> {code:java}
> ---------------------------------------------------------> t
> instant t1 |------------------------------| (writer 1)
> instant t2 |--------------| (writer 2) {code}
> This leaves an inflight commit (t1) before a completed commit (t2) on the
> Hudi timeline. Given that the incremental pull uses only completed commits
> to determine the start and end instants for incremental query and advance the
> checkpoint, the data for the inflight commits may never be pulled from the
> incremental source.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)