ZiyueGuan created HUDI-2917:
-------------------------------

             Summary: Rollback may be incorrect for canIndexLogFile index
                 Key: HUDI-2917
                 URL: https://issues.apache.org/jira/browse/HUDI-2917
             Project: Apache Hudi
          Issue Type: Bug
          Components: Common Core
            Reporter: ZiyueGuan


Problem:

we may find some data which should be rollbacked in hudi table.

Root cause:

Let's first recall how rollback plan generated about log blocks for 
deltaCommit. Hudi takes two cases into consideration.
 # For some log file with no base file, they are comprised by records which are 
all 'insert record'. Delete them directly. Here we assume all inserted record 
should be covered by this way.
 # For those fileID which are updated according to inflight commit meta of 
instant we want to rollback, we append command block to these log file to 
rollback.  Here all updated record are handled.

However, the first condition is not always true. For indexes which can index 
log file, they could insert record to some existing log file. In current 
process, inflight hoodieCommitMeta was generated before they are assigned to 
specific filegroup. 

 

Fix: 

What's needed to fix this problem, we need to use the result of partitioner to 
generate hoodieCommitMeta rather than workProfile. Also, we may need more 
comments in rollback code to remind this case.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to