[
https://issues.apache.org/jira/browse/HUDI-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884391#comment-17884391
]
sivabalan narayanan edited comment on HUDI-8248 at 9/24/24 5:42 PM:
--------------------------------------------------------------------
Thoughts on fixing the issue:
- We could tackle this from few different angles. A: We could fix all callers
to pass in the right value for maxInstantTime/latestInstantTime. B: We could
fix the log record reader logic only to account for rollback blocks by not
adhering to the maxInstantTime/latestInstantTime. C: Fix the rollback timestamp
generation logic to ensure it always has preceding commit times compared to the
deltacommit which happens after.
A: This may not be foolproof, since for an incremental query, end instant is
not really controlled by hudi programmatically. Its what the user/consumer
sets. So, even if we try to fix all callers, we might still have some gaps.
B: This seems logical. Bcoz, rollback is something hudi internally manages to
ensure partially failed log files are not exposed to readers. So, we could
process all rollback blocks in a file slice w/o adhering to the
maxInstantTime/latestInstantTime. For any data block, we could still leave it
as is i.e. any data blocks having instant time greater than
maxInstantTime/latestInstantTime will be ignored to be processed by the
LogRecordReader.
C: (B) alone should be good enough to fix the data consistency issues. But for
a single writer, the main issue stems from the fact that rollback's commit time
is > the next delta commit which happens to succeed. So, its worth exploring
this as well in addition to (B).
Some challenges to call out on C:
In 1.x, our log files are expected to be ordered based on completion times. So,
this should not be an issue. So, that leaves us just with 0.x branch. Even in
that, we can't really control the timestamps for multi-writer scenarios. So,
fixing the rollback commit time generation logic might help w/ single writer,
but may not really have any impact to multi-writers. Again, to re-iterate, even
for single writer, fix (B) should suffice. (C) is just to make bring some
ordering to timestamps among actions in the timeline.
I am filing https://issues.apache.org/jira/browse/HUDI-8249 to go into
specifics of the fix and challenges.
High level,
[https://github.com/apache/hudi/pull/11990] fixes (B). for (C), its pending
discussion.
was (Author: shivnarayan):
Thoughts on fixing the issue:
- We could tackle this from few different angles. A: We could fix all callers
to pass in the right value for maxInstantTime/latestInstantTime. B: We could
fix the log record reader logic only to account for rollback blocks by not
adhering to the maxInstantTime/latestInstantTime. C: Fix the rollback timestamp
generation logic to ensure it always has preceding commit times compared to the
deltacommit which happens after.
A: This may not be foolproof, since for an incremental query, end instant is
not really controlled by hudi programmatically. Its what the user/consumer
sets. So, even if we try to fix all callers, we might still have some gaps.
B: This seems logical. Bcoz, rollback is something hudi internally manages to
ensure partially failed log files are not exposed to readers. So, we could
process all rollback blocks in a file slice w/o adhering to the
maxInstantTime/latestInstantTime. For any data block, we could still leave it
as is i.e. any data blocks having instant time greater than
maxInstantTime/latestInstantTime will be ignored to be processed by the
LogRecordReader.
C: (B) alone should be good enough to fix the data consistency issues. But for
a single writer, the main issue stems from the fact that rollback's commit time
is > the next delta commit which happens to succeed. So, its worth exploring
this as well in addition to (B).
Some challenges to call out on C:
In 1.x, our log files are expected to be ordered based on completion times. So,
this should not be an issue. So, that leaves us just with 0.x branch. Even in
that, we can't really control the timestamps for multi-writer scenarios. So,
fixing the rollback commit time generation logic might help w/ single writer,
but may not really have any impact to multi-writers. Again, to re-iterate, even
for single writer, fix (B) should suffice. (C) is just to make bring some
ordering to timestamps among actions in the timeline.
I am filing https://issues.apache.org/jira/browse/HUDI-8249 to go into
specifics of the fix and challenges.
> Fix LogRecord reader to account for rollback blocks with higher timestamps
> --------------------------------------------------------------------------
>
> Key: HUDI-8248
> URL: https://issues.apache.org/jira/browse/HUDI-8248
> Project: Apache Hudi
> Issue Type: Improvement
> Components: reader-core
> Reporter: sivabalan narayanan
> Assignee: sivabalan narayanan
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.16.0, 1.0.0
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> With LogRecordReader, we also configure maxIntant time to read. Sometimes
> rollback blocks could have higher timestamps compared to the maxInstant set,
> which might lead to some data inconsistencies.
>
> Lets go through an illustration:
> Say, we have t1.dc, t2.dc and t2.dc crashed mid way.
> Current layout is,
> {{base file(t1), lf1(partially committed data w/ t2 as instant time)}}
>
> Then we start t5.dc say. just when we start t5.dc, hudi detects pending
> commit and triggers a rollback. And this rollback will get an instant time of
> t6 (t6.rb). Note that rollback's commit time is greater than t5 or current
> ongoing delta commit.
> So, once rollback completes, this is the layout.
> {{base file, lf1(from t2.dc partially failed), lf3 (rollback command block
> with t6).}}
>
> And once t5.dc completes, this is how the layout looks like
> {{base file, lf1(from t2.dc partially failed), lf3 (rollback command block
> with t6). lf4 (from t5)}}
>
> At this point in time, when we trigger snapshot read or try to trigger
> tagLocation w/ global index, maxInstant is set to last entry among commits
> timeline which is t5. So, while LogRecordReader while processing all log
> blocks, when it reaches lf3, it detects the timestamp of t6 > t5 (i.e max
> instant time) and bails out of for loop. So, in essence it may even read lf4
> in above scenario.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)