[ 
https://issues.apache.org/jira/browse/HUDI-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2792:
--------------------------------------
        Parent: HUDI-1292
    Issue Type: Sub-task  (was: Bug)

> Metadata table enters into inconsistent state
> ---------------------------------------------
>
>                 Key: HUDI-2792
>                 URL: https://issues.apache.org/jira/browse/HUDI-2792
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: sivabalan narayanan
>            Priority: Major
>
> I see we have validations to ensure metadata table is in valid state. 
> Specifically, if a file was deleted from metadata table which was never 
> added, we throw an exception. 
> I could able to reproduce this issue in one of my test scenario. Even though 
> the actual test case is bit tangential, here is the convincing case which 
> requires relaxing this constraint. 
>  
> Due to spark task failures, there could be more files in the system than 
> being tracked in the commit metadata. so, if a user tries to rollback a 
> completed write(which had some spark task failures), the rollback will have 
> more files compared to the initial set of files added as part of commit 
> metadata.
> So, we are in need of relaxing this constraint (if a file was deleted from 
> metadata table which was never added, we throw an exception). If not, I 
> cannot think of a way to get around this. 
>  
> Trying to get ideas on how to go about this. Can we add some minimal 
> constraint, but loosen up the existing one so that we support the spark task 
> failure cases. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to