[ 
https://issues.apache.org/jira/browse/HUDI-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-8249:
--------------------------------------
    Sprint: Hudi 1.0 Sprint 2024/09/23-29

> Generate rollback timestamps earlier than ingestion commits with single writer
> ------------------------------------------------------------------------------
>
>                 Key: HUDI-8249
>                 URL: https://issues.apache.org/jira/browse/HUDI-8249
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: writer-core
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Major
>             Fix For: 0.16.0
>
>
> Our current logic for rollbacks is, we first generate commit times for 
> ingestion in-memory and then check for any pending commits to rollback. If 
> there are any, then we trigger rollbacks. So, what this means is that, in 
> timeline, we could see some out of ordered timestamp. It may not have any 
> material impact, but would be nice to get this resolved. 
>  
>  
> Say, we have t1.dc, t2.dc and t2.dc crashed mid way.
>  
> Then we start t5.dc say. just when we start t5.dc, hudi detects pending 
> commit and triggers a rollback. And this rollback will get an *instant time 
> of t6 (t6.rb). Note that rollback's commit time is greater than t5 or current 
> ongoing delta commit.* 
> So, once rollback completes, we proceed on w/ ingestion writes. So, finally 
> this is how the timeline might look like. 
>  
> t1.dc.req, t1.dc.inflight, t1.dc.completed, 
> t6.rb.req, t6.rb.inflight, t6.rb.completed, 
> t5dc.req, t5.dc.inflight, t5.dc.completed
>  
> So, why not we fix the timestamp generation logic to ensure we try to keep it 
> monotonically increasing atleast for single writer. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to