sivabalan narayanan created HUDI-8451:
-----------------------------------------
Summary: Followup to fix all callers to HoodieLogRecordReader to
set the right value for max instant time
Key: HUDI-8451
URL: https://issues.apache.org/jira/browse/HUDI-8451
Project: Apache Hudi
Issue Type: Improvement
Components: reader-core
Reporter: sivabalan narayanan
As part of [https://github.com/apache/hudi/pull/12033,] we fixed an issue where
log record reader was missing to read a data block in some edge cases.
The fix ensured log record reader will account for all rollback blocks
dis-regarding the max instant time configured while reading log record reader.
But lets also follow through to see if we can fix all callers to set the right
value for the max instant time.
Say, we have t1.dc, t2.dc and t2.dc crashed mid way.
Current layout is,
base file(t1), lf1(partially committed data w/ t2 as instant time)
Then we start t5.dc say. just when we start t5.dc, hudi detects pending commit
and triggers a rollback. And this rollback will get an *instant time of t6
(t6.rb). Note that rollback's commit time is greater than t5 or current ongoing
delta commit.*
So, once rollback completes, this is the layout.
base file, lf1(from t2.dc partially failed), lf3 (rollback command block with
t6).
And once t5.dc completes, this is how the layout looks like
base file, lf1(from t2.dc partially failed), *lf3 (rollback command block with
t6). lf4 (from t5)*
Callers involved:
* This affects global indexes (simple, bloom) by not applying deletes.
Non-global we read base files.. and with only updates in the log, it does not
affect the tagging for non-global (bloom/simple).
* Once there is a new commit, snapshot queries will start returning lf4.
(almost eventually consistent behavior)
** - spark does not factor RBs in latestInstantTime..
** hive/trino/presto if they all use inputFormat
{{BaseHoodieFileIndex#getLatestCompletedInstant}} handles this.
** Flink (FormatUtils is not handling this).
* CDC: Also has issues. Irrespective of whether end instant time is set by the
user or not.
* Incremental queries : Just fixing lastInstant time alone may not suffice.
since the instant time might be set by the user. So, we might have to remove
"break" from within logRecordReader.
* what about indexing? all new indexes added in 1.x
* if clustering is scheduled, right after this. (or) executed inline right
after this ➝ this is not an issue since clustering passes in its own instant
time as latestInstantTime, passing the check and exposing lf4.
* if compaction is scheduled, right after this (or) executed inline right
after this ➝ this accounts by taking into account the rollback when passing
lastInstantTime that includes rollback ts.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)