sivabalan narayanan created HUDI-2792:
-----------------------------------------
Summary: Metadata table enters into inconsistent state
Key: HUDI-2792
URL: https://issues.apache.org/jira/browse/HUDI-2792
Project: Apache Hudi
Issue Type: Bug
Reporter: sivabalan narayanan
I see we have validations to ensure metadata table is in valid state.
Specifically, if a file was deleted from metadata table which was never added,
we throw an exception.
I could able to reproduce this issue in one of my test scenario. Even though
the actual test case is bit tangential, here is the convincing case which
requires relaxing this constraint.
Due to spark task failures, there could be more files in the system than being
tracked in the commit metadata. so, if a user tries to rollback a completed
write(which had some spark task failures), the rollback will have more files
compared to the initial set of files added as part of commit metadata.
So, we are in need of relaxing this constraint (if a file was deleted from
metadata table which was never added, we throw an exception). If not, I cannot
think of a way to get around this.
Trying to get ideas on how to go about this. Can we add some minimal
constraint, but loosen up the existing one so that we support the spark task
failure cases.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)