ZiyueGuan created HUDI-2917:
-------------------------------
Summary: Rollback may be incorrect for canIndexLogFile index
Key: HUDI-2917
URL: https://issues.apache.org/jira/browse/HUDI-2917
Project: Apache Hudi
Issue Type: Bug
Components: Common Core
Reporter: ZiyueGuan
Problem:
we may find some data which should be rollbacked in hudi table.
Root cause:
Let's first recall how rollback plan generated about log blocks for
deltaCommit. Hudi takes two cases into consideration.
# For some log file with no base file, they are comprised by records which are
all 'insert record'. Delete them directly. Here we assume all inserted record
should be covered by this way.
# For those fileID which are updated according to inflight commit meta of
instant we want to rollback, we append command block to these log file to
rollback. Here all updated record are handled.
However, the first condition is not always true. For indexes which can index
log file, they could insert record to some existing log file. In current
process, inflight hoodieCommitMeta was generated before they are assigned to
specific filegroup.
Fix:
What's needed to fix this problem, we need to use the result of partitioner to
generate hoodieCommitMeta rather than workProfile. Also, we may need more
comments in rollback code to remind this case.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)