[ 
https://issues.apache.org/jira/browse/HUDI-8249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884070#comment-17884070
 ] 

sivabalan narayanan edited comment on HUDI-8249 at 9/24/24 12:49 AM:
---------------------------------------------------------------------

Approach A:

introduce a lazy time generation for ingestion writes. 

Challenges:
 - In Deltastreamer writer use-cases, we use the instant time to prepare 
records (auto record key generation) which is triggered before rollbacks can 
kick in. 

So, we can't really make the instant time generation for ingestion writes lazy. 

 

Approach B:

Make rollback instant time as an argument to startCommit() method in 
BaseHoodieWriteClient. 

Challenge: We could have multiple writes that could have failed. So, we can't 
really get this done elegantly. 

 

Approach C: 

We can make rollbackFailedWrites a separate public method in Writeclient. And 
expect every writer to call into it before calling start Commit. 

 

 

Given that https://issues.apache.org/jira/browse/HUDI-8248 is only required for 
0.x branch, we are wondering if this fix is really required in 0.x or should we 
leave it as is given that 8242 solves the data consistency issues and current 
ticket is just standardization or refactoring to keep it straight forward or to 
guard any future bugs. 

 

 

 


was (Author: shivnarayan):
Approach A:

introduce a lazy time generation for ingestion writes. 

Challenges:
- In Deltastreamer writer use-cases, we use the instant time to prepare records 
(auto record key generation) which is triggered before rollbacks can kick in. 

So, we can't really make the instant time generation for ingestion writes lazy. 

 

Approach B:

Make rollback instant time as an argument to startCommit() method in 
BaseHoodieWriteClient. 

Challenge: We could have multiple writes that could have failed. So, we can't 
really get this done elegantly. 

 

Approach C: 

We can make rollbackFailedWrites a separate public method in Writeclient. And 
expect every writer to call into it before calling start Commit. 

 

 

Given that https://issues.apache.org/jira/browse/HUDI-8242 is only required for 
0.x branch, we are wondering if this fix is really required in 0.x or should we 
leave it as is given that 8242 solves the data consistency issues and current 
ticket is just standardization or refactoring to keep it straight forward or to 
guard any future bugs. 

 

 

 

> Generate rollback timestamps earlier than ingestion commits with single writer
> ------------------------------------------------------------------------------
>
>                 Key: HUDI-8249
>                 URL: https://issues.apache.org/jira/browse/HUDI-8249
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: writer-core
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Major
>             Fix For: 0.16.0
>
>
> Our current logic for rollbacks is, we first generate commit times for 
> ingestion in-memory and then check for any pending commits to rollback. If 
> there are any, then we trigger rollbacks. So, what this means is that, in 
> timeline, we could see some out of ordered timestamp. It may not have any 
> material impact, but would be nice to get this resolved. 
>  
>  
> Say, we have t1.dc, t2.dc and t2.dc crashed mid way.
>  
> Then we start t5.dc say. just when we start t5.dc, hudi detects pending 
> commit and triggers a rollback. And this rollback will get an *instant time 
> of t6 (t6.rb). Note that rollback's commit time is greater than t5 or current 
> ongoing delta commit.* 
> So, once rollback completes, we proceed on w/ ingestion writes. So, finally 
> this is how the timeline might look like. 
>  
> t1.dc.req, t1.dc.inflight, t1.dc.completed, 
> t6.rb.req, t6.rb.inflight, t6.rb.completed, 
> t5dc.req, t5.dc.inflight, t5.dc.completed
>  
> So, why not we fix the timestamp generation logic to ensure we try to keep it 
> monotonically increasing atleast for single writer. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to