sivabalan narayanan created HUDI-8249:
-----------------------------------------
Summary: Generate rollback timestamps earlier than ingestion
commits with single writer
Key: HUDI-8249
URL: https://issues.apache.org/jira/browse/HUDI-8249
Project: Apache Hudi
Issue Type: Improvement
Components: writer-core
Reporter: sivabalan narayanan
Our current logic for rollbacks is, we first generate commit times for
ingestion in-memory and then check for any pending commits to rollback. If
there are any, then we trigger rollbacks. So, what this means is that, in
timeline, we could see some out of ordered timestamp. It may not have any
material impact, but would be nice to get this resolved.
Say, we have t1.dc, t2.dc and t2.dc crashed mid way.
Then we start t5.dc say. just when we start t5.dc, hudi detects pending commit
and triggers a rollback. And this rollback will get an *instant time of t6
(t6.rb). Note that rollback's commit time is greater than t5 or current ongoing
delta commit.*
So, once rollback completes, we proceed on w/ ingestion writes. So, finally
this is how the timeline might look like.
t1.dc.req, t1.dc.inflight, t1.dc.completed,
t6.rb.req, t6.rb.inflight, t6.rb.completed,
t5dc.req, t5.dc.inflight, t5.dc.completed
So, why not we fix the timestamp generation logic to ensure we try to keep it
monotonically increasing atleast for single writer.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)