[ 
https://issues.apache.org/jira/browse/HUDI-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-3604:
--------------------------------------
    Priority: Blocker  (was: Major)

> Missing to apply rollback commits to Metadata table
> ---------------------------------------------------
>
>                 Key: HUDI-3604
>                 URL: https://issues.apache.org/jira/browse/HUDI-3604
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: metadata
>            Reporter: sivabalan narayanan
>            Priority: Blocker
>
> C1, C2, C3. C4 (RB_C1) 
> When C4 (i.e. RB of C1 is triggered, after deleting data files, and after 
> deleting the commits files in timeline (C1), lets say the process crashed 
> (before applying to MDT). 
> Even if the user restarts the pipeline, there won't be any pending failed 
> commits to rollback and new commit will continue. w/o worrying about C4. But 
> metadata table will miss out this rollback commit. 
>  
> Proposal: 
> We need two fixes atleast: 
> a. We should clean the C1 commit files from data table timeline only after 
> applying the rollback commit to MDT. This way we will ensure no commit files 
> in data table will be cleaned up before applying the rollback to MDT. 
> b. Whenever we check for failed commits to rollback, we should also check for 
> any dangling rollback to be re-attempted. This again needs some fixes in 
> rollback executor as well. since chances that the commit to rollback may not 
> exist in data table timeline at all. but we need to re-attempt the rollback 
> and get it to completion. Its not easy to detect a pending rollback from a 
> dangling rollback. So, can't think of ways to detect dangling rollback just 
> by looking at data table active timeline. 
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to