[
https://issues.apache.org/jira/browse/HUDI-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan reassigned HUDI-2432:
-----------------------------------------
Assignee: sivabalan narayanan
> Fix restore by adding a requested instant and restore plan
> ----------------------------------------------------------
>
> Key: HUDI-2432
> URL: https://issues.apache.org/jira/browse/HUDI-2432
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: sivabalan narayanan
> Assignee: sivabalan narayanan
> Priority: Major
> Fix For: 0.10.0
>
>
> Fix restore by adding a requested instant and restore plan
>
> Trying to see if we really need a plan. Dumping my thoughts here.
> Restore internally converts to N no of rollbacks. We fetch active instants in
> reverse order from timeline and trigger rollbacks 1 by 1. We have already
> have a patch fixing rollback to add rollback Plan in rollback.requested meta
> file. So, walking through failure scenarios.
>
> If 5 instants need to be rolledback, but process crashed after 3 rollbacks.
> * When we retry restore 2nd time, only pending 2 will be returned from
> timeline for instants that need to be rolledback. And so we will rollback
> remaining 2 commits/instants. Only missing piece will be the list of rollback
> metadata that gets serialized as part of restore commit metadata might miss
> first 3 commits. Anyways, restore is a destructive operation, not sure if not
> serializing the already rolledback commit to restore commit metadata will
> cause any issues.
> ** Metadata table: first 3 would have been rolledback in metadata table as
> well (applied as upsert). and so should be fine when we retrigger the
> restore. the rest 2 will get applied.
> ** If by chance, one of the rollback gets committted to metadata table and
> failed before getting committed to data table: this 2nd time rollback of same
> instant is yet another delta commit to metadata table and we should be good
> there too.
> * If there was a crash during a rollback was inflight.
> ** let's say rollback of c3 failed while in progress. when we re-attempt
> restore, we will again try to rollback c3 again. With the fix for rollback
> plan in place, we should be good as we will continue the rollback and get it
> to completion.
> ** Metadata table: for first time, since the rollback failed while inflight,
> there won't be any trace of this in metadata table. but when we retry for 2nd
> time, this should get applied to metadata table. the rollback plan fix should
> ensure rollback commit metadata has all file info from original plan and not
> just the successfully deleted ones. bcoz, in this case, during 2nd time, only
> pending files will be deleted.
>
> From the looks of it, I don't see a real need for restore plan. Atleast it
> does not block our metadata synchronous patch as such. But open to hear from
> others.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)