Prashant Wason created HUDI-6114:
------------------------------------

             Summary: Rollback handling in AbstractHoodieLogRecordReader may 
not work correctly when multi-writer is enabled
                 Key: HUDI-6114
                 URL: https://issues.apache.org/jira/browse/HUDI-6114
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Prashant Wason
            Assignee: Prashant Wason


When a ROLLBACK command block is encountered, only the last log block is 
potentially rolled back. This may not work in case of multi-writers where the 
rollback may be aplicable to an older block.

E.g. Assume two processed P1 and P2 which are writing data to the MOR table. P1 
started at time t1 and P2 started at t2. Lets assume P1 writes the log block 
and then p2 writes the log block.

 

So the log file has two blocks now [LBlock1(instantTime=t1), 
LBlock2(instantTime=t2)]

If the P1 failed after writing to log file but before the commit could be 
created, the inflight commit at t1 would eventually be rolled back. In that 
case a rollback block will be written. The log file would look like this:

[LBlock1(instantTime=t1), LBlock2(instantTime=t2), LBlock(Rollback block with 
targetInstantTime=t1)]

 

The current AbstractHoodieLogRecordReader code will not rollback LBlock1 as it 
only applies rollback to the last block.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to