[jira] [Updated] (HUDI-2432) Fix restore by adding a requested instant and restore plan

sivabalan narayanan (Jira) Tue, 14 Sep 2021 16:45:05 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


sivabalan narayanan updated HUDI-2432:
--------------------------------------
        Parent: HUDI-1292
    Issue Type: Sub-task  (was: Improvement)

> Fix restore by adding a requested instant and restore plan
> ----------------------------------------------------------
>
>                 Key: HUDI-2432
>                 URL: https://issues.apache.org/jira/browse/HUDI-2432
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Major
>             Fix For: 0.10.0
>
>
> Fix restore by adding a requested instant and restore plan
>  
> Trying to see if we really need a plan. Dumping my thoughts here. 
> Restore internally converts to N no of rollbacks. We fetch active instants in 
> reverse order from timeline and trigger rollbacks 1 by 1. We have already 
> have a patch fixing rollback to add rollback Plan in rollback.requested meta 
> file. So, walking through failure scenarios. 
>  
> If 5 instants need to be rolledback, but process crashed after 3 rollbacks. 
>  * When we retry restore 2nd time, only pending 2 will be returned from 
> timeline for instants that need to be rolledback. And so we will rollback 
> remaining 2 commits/instants. Only missing piece will be the list of rollback 
> metadata that gets serialized as part of restore commit metadata might miss 
> first 3 commits. Anyways, restore is a destructive operation, not sure if not 
> serializing the already rolledback commit to restore commit metadata will 
> cause any issues. 
>  ** Metadata table: first 3 would have been rolledback in metadata table as 
> well (applied as upsert). and so should be fine when we retrigger the 
> restore. the rest 2 will get applied. 
>  ** If by chance, one of the rollback gets committted to metadata table and 
> failed before getting committed to data table: this 2nd time rollback of same 
> instant is yet another delta commit to metadata table and we should be good 
> there too. 
>  * If there was a crash during a rollback was inflight.
>  ** let's say rollback of c3 failed while in progress. when we re-attempt 
> restore, we will again try to rollback c3 again. With the fix for rollback 
> plan in place, we should be good as we will continue the rollback and get it 
> to completion. 
>  ** Metadata table: for first time, since the rollback failed while inflight, 
> there won't be any trace of this in metadata table. but when we retry for 2nd 
> time, this should get applied to metadata table. the rollback plan fix should 
> ensure rollback commit metadata has all file info from original plan and not 
> just the successfully deleted ones. bcoz, in this case, during 2nd time, only 
> pending files will be deleted.
>  
> From the looks of it, I don't see a real need for restore plan. Atleast it 
> does not block our metadata synchronous patch as such. But open to hear from 
> others.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-2432) Fix restore by adding a requested instant and restore plan

Reply via email to