Hui An created HUDI-5517:
----------------------------
Summary: HoodieTimeline support filter instants by state
transition time
Key: HUDI-5517
URL: https://issues.apache.org/jira/browse/HUDI-5517
Project: Apache Hudi
Issue Type: New Feature
Components: core, timeline-server
Reporter: Hui An
Hudi timeline can actually miss some instants if we incremental pulling from
upstream hudi table, which is written by several writers.
For example, say we have 2 writers writing data to the hudi table, and the last
success incremental pulling end timestamp is 20230101160756490
w1 is writing 20230101160759320, w2 is writing 20230101160803680, if w2 is
finished earlier than the w1, then the incremental pulling end timestamp will
be updated to 20230101160803680, and actually w1's commit: 20230101160759320
will be skipped since it's instant time is earlier than the w2's.
We actually needs to use commit end time(state transition time) to filter the
commits if using incremental pulling. As w2's state transition time is earlier
than the w1's, so w1's data won't be filtered.
This relates to the [HUDI-1623|https://issues.apache.org/jira/browse/HUDI-1623]
but not adding end time to the end of each commit, instead use
`FileStatus.getModificationTime` to represent the end time.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)