prashantwason opened a new pull request, #8604: URL: https://github.com/apache/hudi/pull/8604
[HUDI-6151] Rollback previously applied commits to MDT when operations are retried. ### Change Logs Operations like Clean, Compaction are retried after failures with the same instant time. If the previous run of the operation successfully committed to the MDT but failed to commit to the dataset, then the operation will be retried later with the same instantTime causing duplicate updates applied to MDT. Currently, we simply delete the completed deltacommit without rolling back the deltacommit. To handle this, we detect a replay of operation and rollback any changes from that operation in MDT. ### Impact Fixes the issue of duplicate log blocks written in the MDT. This is deterimental for indexes where duplicates are not allowed. ### Risk level (write none, low medium or high below) None. Unit test has been added. ### Documentation Update None ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
