Hui An created HUDI-5517:
----------------------------

             Summary: HoodieTimeline support filter instants by state 
transition time
                 Key: HUDI-5517
                 URL: https://issues.apache.org/jira/browse/HUDI-5517
             Project: Apache Hudi
          Issue Type: New Feature
          Components: core, timeline-server
            Reporter: Hui An


Hudi timeline can actually miss some instants if we incremental pulling from 
upstream hudi table, which is written by several writers.

For example, say we have 2 writers writing data to the hudi table, and the last 
success incremental pulling end timestamp is 20230101160756490

w1 is writing 20230101160759320, w2 is writing 20230101160803680, if w2 is 
finished earlier than the w1, then the incremental pulling end timestamp will 
be updated to 20230101160803680, and actually w1's commit: 20230101160759320 
will be skipped since it's instant time is earlier than the w2's.

We actually needs to use commit end time(state transition time) to filter the 
commits if using incremental pulling. As w2's state transition time is earlier 
than the w1's, so w1's data won't be filtered.

This relates to the [HUDI-1623|https://issues.apache.org/jira/browse/HUDI-1623] 
but not adding end time to the end of each commit, instead use 
`FileStatus.getModificationTime` to represent the end time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to