[ 
https://issues.apache.org/jira/browse/HUDI-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884391#comment-17884391
 ] 

sivabalan narayanan edited comment on HUDI-8248 at 9/24/24 5:42 PM:
--------------------------------------------------------------------

Thoughts on fixing the issue: 
 - We could tackle this from few different angles. A: We could fix all callers 
to pass in the right value for maxInstantTime/latestInstantTime. B: We could 
fix the log record reader logic only to account for rollback blocks by not 
adhering to the maxInstantTime/latestInstantTime. C: Fix the rollback timestamp 
generation logic to ensure it always has preceding commit times compared to the 
deltacommit which happens after.

 

A: This may not be foolproof, since for an incremental query, end instant is 
not really controlled by hudi programmatically. Its what the user/consumer 
sets. So, even if we try to fix all callers, we might still have some gaps. 

B: This seems logical. Bcoz, rollback is something hudi internally manages to 
ensure partially failed log files are not exposed to readers. So, we could 
process all rollback blocks in a file slice w/o adhering to the 
maxInstantTime/latestInstantTime. For any data block, we could still leave it 
as is i.e. any data blocks having instant time greater than 
maxInstantTime/latestInstantTime will be ignored to be processed by the 
LogRecordReader. 

C: (B) alone should be good enough to fix the data consistency issues. But for 
a single writer, the main issue stems from the fact that rollback's commit time 
is > the next delta commit which happens to succeed. So, its worth exploring 
this as well in addition to (B).  

Some challenges to call out on C:

In 1.x, our log files are expected to be ordered based on completion times. So, 
this should not be an issue. So, that leaves us just with 0.x branch. Even in 
that, we can't really control the timestamps for multi-writer scenarios. So, 
fixing the rollback commit time generation logic might help w/ single writer, 
but may not really have any impact to multi-writers. Again, to re-iterate, even 
for single writer, fix (B) should suffice. (C) is just to make bring some 
ordering to timestamps among actions in the timeline. 

I am filing https://issues.apache.org/jira/browse/HUDI-8249 to go into 
specifics of the fix and challenges. 

 

High level, 

[https://github.com/apache/hudi/pull/11990] fixes (B). for (C), its pending 
discussion.


was (Author: shivnarayan):
Thoughts on fixing the issue: 

- We could tackle this from few different angles. A: We could fix all callers 
to pass in the right value for maxInstantTime/latestInstantTime. B: We could 
fix the log record reader logic only to account for rollback blocks by not 
adhering to the maxInstantTime/latestInstantTime. C: Fix the rollback timestamp 
generation logic to ensure it always has preceding commit times compared to the 
deltacommit which happens after.

 

A: This may not be foolproof, since for an incremental query, end instant is 
not really controlled by hudi programmatically. Its what the user/consumer 
sets. So, even if we try to fix all callers, we might still have some gaps. 

B: This seems logical. Bcoz, rollback is something hudi internally manages to 
ensure partially failed log files are not exposed to readers. So, we could 
process all rollback blocks in a file slice w/o adhering to the 
maxInstantTime/latestInstantTime. For any data block, we could still leave it 
as is i.e. any data blocks having instant time greater than 
maxInstantTime/latestInstantTime will be ignored to be processed by the 
LogRecordReader. 

C: (B) alone should be good enough to fix the data consistency issues. But for 
a single writer, the main issue stems from the fact that rollback's commit time 
is > the next delta commit which happens to succeed. So, its worth exploring 
this as well in addition to (B).  

Some challenges to call out on C:

In 1.x, our log files are expected to be ordered based on completion times. So, 
this should not be an issue. So, that leaves us just with 0.x branch. Even in 
that, we can't really control the timestamps for multi-writer scenarios. So, 
fixing the rollback commit time generation logic might help w/ single writer, 
but may not really have any impact to multi-writers. Again, to re-iterate, even 
for single writer, fix (B) should suffice. (C) is just to make bring some 
ordering to timestamps among actions in the timeline. 

I am filing https://issues.apache.org/jira/browse/HUDI-8249 to go into 
specifics of the fix and challenges. 

 

> Fix LogRecord reader to account for rollback blocks with higher timestamps
> --------------------------------------------------------------------------
>
>                 Key: HUDI-8248
>                 URL: https://issues.apache.org/jira/browse/HUDI-8248
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: reader-core
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.16.0, 1.0.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> With LogRecordReader, we also configure maxIntant time to read. Sometimes 
> rollback blocks could have higher timestamps compared to the maxInstant set, 
> which might lead to some data inconsistencies.  
>  
> Lets go through an illustration:
> Say, we have t1.dc, t2.dc and t2.dc crashed mid way.
> Current layout is,
> {{base file(t1), lf1(partially committed data w/ t2 as instant time)}}
>  
> Then we start t5.dc say. just when we start t5.dc, hudi detects pending 
> commit and triggers a rollback. And this rollback will get an instant time of 
> t6 (t6.rb). Note that rollback's commit time is greater than t5 or current 
> ongoing delta commit.
> So, once rollback completes, this is the layout.
> {{base file, lf1(from t2.dc partially failed), lf3 (rollback command block 
> with t6).}}
>  
> And once t5.dc completes, this is how the layout looks like
> {{base file, lf1(from t2.dc partially failed), lf3 (rollback command block 
> with t6). lf4 (from t5)}}
>  
> At this point in time, when we trigger snapshot read or try to trigger 
> tagLocation w/ global index, maxInstant is set to last entry among commits 
> timeline which is t5. So, while LogRecordReader while processing all log 
> blocks, when it reaches lf3, it detects the timestamp of t6 > t5 (i.e max 
> instant time) and bails out of for loop. So, in essence it may even read lf4 
> in above scenario.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to