[
https://issues.apache.org/jira/browse/HUDI-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-8249:
--------------------------------------
Remaining Estimate: 3h
Original Estimate: 3h
> Generate rollback timestamps earlier than ingestion commits with single writer
> ------------------------------------------------------------------------------
>
> Key: HUDI-8249
> URL: https://issues.apache.org/jira/browse/HUDI-8249
> Project: Apache Hudi
> Issue Type: Improvement
> Components: writer-core
> Reporter: sivabalan narayanan
> Assignee: sivabalan narayanan
> Priority: Major
> Fix For: 0.16.0, 1.0.0
>
> Original Estimate: 3h
> Remaining Estimate: 3h
>
> Our current logic for rollbacks is, we first generate commit times for
> ingestion in-memory and then check for any pending commits to rollback. If
> there are any, then we trigger rollbacks. So, what this means is that, in
> timeline, we could see some out of ordered timestamp. It may not have any
> material impact, but would be nice to get this resolved.
>
>
> Say, we have t1.dc, t2.dc and t2.dc crashed mid way.
>
> Then we start t5.dc say. just when we start t5.dc, hudi detects pending
> commit and triggers a rollback. And this rollback will get an *instant time
> of t6 (t6.rb). Note that rollback's commit time is greater than t5 or current
> ongoing delta commit.*
> So, once rollback completes, we proceed on w/ ingestion writes. So, finally
> this is how the timeline might look like.
>
> t1.dc.req, t1.dc.inflight, t1.dc.completed,
> t6.rb.req, t6.rb.inflight, t6.rb.completed,
> t5dc.req, t5.dc.inflight, t5.dc.completed
>
> So, why not we fix the timestamp generation logic to ensure we try to keep it
> monotonically increasing atleast for single writer.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)