sivabalan narayanan created HUDI-6761:
-----------------------------------------

             Summary: Fix rollbacks with MDT for MOR data table with log files
                 Key: HUDI-6761
                 URL: https://issues.apache.org/jira/browse/HUDI-6761
             Project: Apache Hudi
          Issue Type: Bug
          Components: metadata
            Reporter: sivabalan narayanan


There are few rollback scenarios, where some log files from data table could be 
missed to sync to MDT. Esply for cleaner purpose, every valid file from data 
table (which could be seen with fs.listStatus), should be synced to MDT. we 
can't afford to miss any log files. 

 

Two major gaps which needs to be fixed. 

1. log files from original commit being rolled back. 

for eg, t5.dc fails mid-way in DT which added lf2. 

we start a rollback commit t6.rb. when t6 syncs to MDT, we should also track 
lf2 and ensure we sync to MDT. 

2. log files added by previous attempts of rollbacks. 

in the above scenario, rollback could have added a log file (rollback command 
block) called lf3. 

but if the rollback failed and is re-attempted, it could add another file 
called lf4. So, when this rollback syncs to MDT, we need to somehow ensure lf3 
is also synced w/o a miss. 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to