[
https://issues.apache.org/jira/browse/HUDI-6761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Y Ethan Guo updated HUDI-6761:
------------------------------
Fix Version/s: 1.0.2
> Fix rollbacks with MDT for MOR data table with log files
> --------------------------------------------------------
>
> Key: HUDI-6761
> URL: https://issues.apache.org/jira/browse/HUDI-6761
> Project: Apache Hudi
> Issue Type: Bug
> Components: metadata
> Reporter: sivabalan narayanan
> Assignee: sivabalan narayanan
> Priority: Major
> Fix For: 1.0.2
>
>
> There are few rollback scenarios, where some log files from data table could
> be missed to sync to MDT. Esply for cleaner purpose, every valid file from
> data table (which could be seen with fs.listStatus), should be synced to MDT.
> we can't afford to miss any log files.
>
> Two major gaps which needs to be fixed.
> 1. log files from original commit being rolled back.
> for eg, t5.dc fails mid-way in DT which added lf2.
> we start a rollback commit t6.rb. when t6 syncs to MDT, we should also track
> lf2 and ensure we sync to MDT.
> 2. log files added by previous attempts of rollbacks.
> in the above scenario, rollback could have added a log file (rollback command
> block) called lf3.
> but if the rollback failed and is re-attempted, it could add another file
> called lf4. So, when this rollback syncs to MDT, we need to somehow ensure
> lf3 is also synced w/o a miss.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)