[
https://issues.apache.org/jira/browse/HUDI-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512826#comment-17512826
]
Ethan Guo commented on HUDI-3604:
---------------------------------
Case (a) is going to be fixed by the PR.
For case (b), we already have the logic to retrieve any requested and inflight
rollbacks that have not finished before, and reattempt them. However, there's
a bug in the reattempting logic when the instant to rollback has it's instant
files deleted from the active timeline. The bug is tracker in HUDI-3720
separately.
> Missing to apply rollback commits to Metadata table if rollback failed mid-way
> ------------------------------------------------------------------------------
>
> Key: HUDI-3604
> URL: https://issues.apache.org/jira/browse/HUDI-3604
> Project: Apache Hudi
> Issue Type: Bug
> Components: metadata
> Reporter: sivabalan narayanan
> Assignee: Ethan Guo
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.11.0
>
>
> C1, C2, C3. C4 (RB_C1) in progress.
> When C4 (i.e. RB of C1 is triggered), after deleting data files, and after
> deleting the commits files in timeline (C1), let's say the process crashed
> (before applying to MDT).
> Even if the user restarts the pipeline, there won't be any pending failed
> commits(i.e. C1) to rollback and new commit will continue. w/o worrying about
> C4. But metadata table will miss out this rollback commit.
>
> Proposal:
> We need two fixes atleast:
> a. We should clean the C1 commit files from data table timeline only after
> applying the rollback commit to MDT. This way we will ensure no commit files
> in data table will be cleaned up before applying the rollback to MDT.
> b. Whenever we check for failed commits to rollback, we should also check for
> any dangling rollback to be re-attempted. This again needs some fixes in
> rollback executor as well. since chances that the commit to rollback may not
> exist in data table timeline at all. but we need to re-attempt the rollback
> and get it to completion(so that we let metadata make progress wrt
> compactions). It's not easy to detect a pending rollback from a dangling
> rollback. So, can't think of ways to detect dangling rollback just by looking
> at data table active timeline. hence had to re-attempt any pending rollback
> instants and get it to completion.
>
> Dangling rollbacks:
> Following up on above eg:
> C1, C2, C3, C4(RB_C1) failed mid-way. But the crash happens after deleting
> the datafiles and deleting commit files in data timeline. So, lets say the
> process crashes as of now (before applying to MDT). If the user restarts the
> pipeline, hudi will check for partially failed commits to trigger rollback.
> But since C1 is deleted from the timeline by C4(RB_C1), rollback of C1 will
> not kick in. So, C4 i.e RB_C1 will just stay in the timeline forever since
> there is no other trigger that can take it to completion or delete it.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)