[
https://issues.apache.org/jira/browse/HUDI-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-2792:
--------------------------------------
Priority: Blocker (was: Major)
> Metadata table enters into inconsistent state
> ---------------------------------------------
>
> Key: HUDI-2792
> URL: https://issues.apache.org/jira/browse/HUDI-2792
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: sivabalan narayanan
> Assignee: sivabalan narayanan
> Priority: Blocker
> Fix For: 0.10.0
>
>
> I see we have validations to ensure metadata table is in valid state.
> Specifically, if a file was deleted from metadata table which was never
> added, we throw an exception.
> I could able to reproduce this issue in one of my test scenario. Even though
> the actual test case is bit tangential, here is the convincing case which
> requires relaxing this constraint.
>
> Due to spark task failures, there could be more files in the system than
> being tracked in the commit metadata. so, if a user tries to rollback a
> completed write(which had some spark task failures), the rollback will have
> more files compared to the initial set of files added as part of commit
> metadata.
> So, we are in need of relaxing this constraint (if a file was deleted from
> metadata table which was never added, we throw an exception). If not, I
> cannot think of a way to get around this.
>
> Trying to get ideas on how to go about this. Can we add some minimal
> constraint, but loosen up the existing one so that we support the spark task
> failure cases.
>
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)