hudi-bot opened a new issue, #16208:
URL: https://github.com/apache/hudi/issues/16208

   There are few rollback scenarios, where some log files from data table could 
be missed to sync to MDT. Esply for cleaner purpose, every valid file from data 
table (which could be seen with fs.listStatus), should be synced to MDT. we 
can't afford to miss any log files. 
   
    
   
   Two major gaps which needs to be fixed. 
   
   1. log files from original commit being rolled back. 
   
   for eg, t5.dc fails mid-way in DT which added lf2. 
   
   we start a rollback commit t6.rb. when t6 syncs to MDT, we should also track 
lf2 and ensure we sync to MDT. 
   
   2. log files added by previous attempts of rollbacks. 
   
   in the above scenario, rollback could have added a log file (rollback 
command block) called lf3. 
   
   but if the rollback failed and is re-attempted, it could add another file 
called lf4. So, when this rollback syncs to MDT, we need to somehow ensure lf3 
is also synced w/o a miss. 
   
    
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-6761
   - Type: Bug
   - Fix version(s):
     - 1.1.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to