Prashant Wason created HUDI-6114:
------------------------------------
Summary: Rollback handling in AbstractHoodieLogRecordReader may
not work correctly when multi-writer is enabled
Key: HUDI-6114
URL: https://issues.apache.org/jira/browse/HUDI-6114
Project: Apache Hudi
Issue Type: Bug
Reporter: Prashant Wason
Assignee: Prashant Wason
When a ROLLBACK command block is encountered, only the last log block is
potentially rolled back. This may not work in case of multi-writers where the
rollback may be aplicable to an older block.
E.g. Assume two processed P1 and P2 which are writing data to the MOR table. P1
started at time t1 and P2 started at t2. Lets assume P1 writes the log block
and then p2 writes the log block.
So the log file has two blocks now [LBlock1(instantTime=t1),
LBlock2(instantTime=t2)]
If the P1 failed after writing to log file but before the commit could be
created, the inflight commit at t1 would eventually be rolled back. In that
case a rollback block will be written. The log file would look like this:
[LBlock1(instantTime=t1), LBlock2(instantTime=t2), LBlock(Rollback block with
targetInstantTime=t1)]
The current AbstractHoodieLogRecordReader code will not rollback LBlock1 as it
only applies rollback to the last block.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)