[
https://issues.apache.org/jira/browse/HUDI-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hui An updated HUDI-5517:
-------------------------
Description:
Hudi timeline can actually miss some instants if we incremental pulling from
upstream hudi table, which is written by several writers.
For example, say we have 2 writers writing data to the hudi table, and the last
success incremental pulling end timestamp is 001
w1 is writing 002, w2 is writing 003, if w2 is finished earlier than the w1,
then the incremental pulling end timestamp will be updated to 003, and actually
w1's commit: 002 will be skipped since it's instant time is earlier than the
w2's.
We actually needs to use commit end time(state transition time) to filter the
commits if using incremental pulling. As w2's state transition time is earlier
than the w1's, so w1's data won't be filtered.
This relates to the HUDI-1623 but not adding end time to the end of each
commit, instead use `FileStatus.getModificationTime` to represent the end time.
was:
Hudi timeline can actually miss some instants if we incremental pulling from
upstream hudi table, which is written by several writers.
For example, say we have 2 writers writing data to the hudi table, and the last
success incremental pulling end timestamp is 20230101160756490
w1 is writing 20230101160759320, w2 is writing 20230101160803680, if w2 is
finished earlier than the w1, then the incremental pulling end timestamp will
be updated to 20230101160803680, and actually w1's commit: 20230101160759320
will be skipped since it's instant time is earlier than the w2's.
We actually needs to use commit end time(state transition time) to filter the
commits if using incremental pulling. As w2's state transition time is earlier
than the w1's, so w1's data won't be filtered.
This relates to the [HUDI-1623|https://issues.apache.org/jira/browse/HUDI-1623]
but not adding end time to the end of each commit, instead use
`FileStatus.getModificationTime` to represent the end time.
> HoodieTimeline support filter instants by state transition time
> ---------------------------------------------------------------
>
> Key: HUDI-5517
> URL: https://issues.apache.org/jira/browse/HUDI-5517
> Project: Apache Hudi
> Issue Type: New Feature
> Components: core, timeline-server
> Reporter: Hui An
> Priority: Major
>
> Hudi timeline can actually miss some instants if we incremental pulling from
> upstream hudi table, which is written by several writers.
> For example, say we have 2 writers writing data to the hudi table, and the
> last success incremental pulling end timestamp is 001
> w1 is writing 002, w2 is writing 003, if w2 is finished earlier than the w1,
> then the incremental pulling end timestamp will be updated to 003, and
> actually w1's commit: 002 will be skipped since it's instant time is earlier
> than the w2's.
> We actually needs to use commit end time(state transition time) to filter the
> commits if using incremental pulling. As w2's state transition time is
> earlier than the w1's, so w1's data won't be filtered.
> This relates to the HUDI-1623 but not adding end time to the end of each
> commit, instead use `FileStatus.getModificationTime` to represent the end
> time.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)