sivabalan narayanan created HUDI-3604:
-----------------------------------------

             Summary: Missing to apply rollback commits to Metadata table
                 Key: HUDI-3604
                 URL: https://issues.apache.org/jira/browse/HUDI-3604
             Project: Apache Hudi
          Issue Type: Bug
          Components: metadata
            Reporter: sivabalan narayanan


C1, C2, C3. C4 (RB_C1) 

When C4 (i.e. RB of C1 is triggered, after deleting data files, and after 
deleting the commits files in timeline (C1), lets say the process crashed 
(before applying to MDT). 

Even if the user restarts the pipeline, there won't be any pending failed 
commits to rollback and new commit will continue. w/o worrying about C4. But 
metadata table will miss out this rollback commit. 

 

Proposal: 

We need two fixes atleast: 

a. We should clean the C1 commit files from data table timeline only after 
applying the rollback commit to MDT. This way we will ensure no commit files in 
data table will be cleaned up before applying the rollback to MDT. 

b. Whenever we check for failed commits to rollback, we should also check for 
any dangling rollback to be re-attempted. This again needs some fixes in 
rollback executor as well. since chances that the commit to rollback may not 
exist in data table timeline at all. but we need to re-attempt the rollback and 
get it to completion. Its not easy to detect a pending rollback from a dangling 
rollback. So, can't think of ways to detect dangling rollback just by looking 
at data table active timeline. 

 

 

 

 

 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to