[jira] [Updated] (HUDI-2432) Fix restore by adding a requested instant and restore plan

sivabalan narayanan (Jira) Wed, 15 Sep 2021 08:27:04 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


sivabalan narayanan updated HUDI-2432:
--------------------------------------
    Description: 
Fix restore by adding a requested instant and restore plan

 

Trying to see if we really need a plan. Dumping my thoughts here. 

Restore internally converts to N no of rollbacks. We fetch active instants in 
reverse order from timeline and trigger rollbacks 1 by 1. We have already have 
a patch fixing rollback to add rollback Plan in rollback.requested meta file. 
So, walking through failure scenarios. 

 

If 5 instants need to be rolledback, but process crashed after 3 rollbacks. 
 * When we retry restore 2nd time, only pending 2 will be returned from 
timeline for instants that need to be rolledback. And so we will rollback 
remaining 2 commits/instants. Only missing piece will be the list of rollback 
metadata that gets serialized as part of restore commit metadata might miss 
first 3 commits. Anyways, restore is a destructive operation, not sure if not 
serializing the already rolledback commit to restore commit metadata will cause 
any issues. 
 ** Metadata table: first 3 would have been rolledback in metadata table as 
well (applied as upsert). and so should be fine when we retrigger the restore. 
the rest 2 will get applied. 
 * If there was a crash during a rollback was inflight.
 ** let's say rollback of c3 failed while in progress. when we re-attempt 
restore, we will again try to rollback c3 again. With the fix for rollback plan 
in place, we should be good as we will continue the rollback and get it to 
completion. and then go on to rollback C2 and C1. 
 ** Metadata table: for first time, since the rollback of C3 failed while 
inflight, there won't be any trace of this in metadata table. but when we retry 
for 2nd time, this should get applied to metadata table. the rollback plan fix 
should ensure rollback commit metadata has all file info from original plan and 
not just the successfully deleted ones. bcoz, in this case, during 2nd time, 
only pending files will be deleted.
 ** If by chance, one of the rollback gets committted to metadata table and 
failed before getting committed to data table: the 2nd time rollback of same 
instant is yet another delta commit to metadata table and so we should be good 
there too. we might instruct metadata table to delete files repeatedly may be. 

 

Update:

I didn't realize that individual rollbacks are not published to timeline as 
part of restore. So, if restore fails midway, in the 2nd attempt, only subset 
of rollback will be applied to metadata table(which got rolledback during the 
2nd attempt). so, we need a plan for restore as well. 

But since restore anyway is a destructive operation and is advised to stop all 
processes, we do have an option to clean up metadata table and reboostrap 
completely once restore is complete. 

 

 

 

  was:
Fix restore by adding a requested instant and restore plan

 

Trying to see if we really need a plan. Dumping my thoughts here. 

Restore internally converts to N no of rollbacks. We fetch active instants in 
reverse order from timeline and trigger rollbacks 1 by 1. We have already have 
a patch fixing rollback to add rollback Plan in rollback.requested meta file. 
So, walking through failure scenarios. 

 

If 5 instants need to be rolledback, but process crashed after 3 rollbacks. 
 * When we retry restore 2nd time, only pending 2 will be returned from 
timeline for instants that need to be rolledback. And so we will rollback 
remaining 2 commits/instants. Only missing piece will be the list of rollback 
metadata that gets serialized as part of restore commit metadata might miss 
first 3 commits. Anyways, restore is a destructive operation, not sure if not 
serializing the already rolledback commit to restore commit metadata will cause 
any issues. 
 ** Metadata table: first 3 would have been rolledback in metadata table as 
well (applied as upsert). and so should be fine when we retrigger the restore. 
the rest 2 will get applied. 
 * If there was a crash during a rollback was inflight.
 ** let's say rollback of c3 failed while in progress. when we re-attempt 
restore, we will again try to rollback c3 again. With the fix for rollback plan 
in place, we should be good as we will continue the rollback and get it to 
completion. and then go on to rollback C2 and C1. 
 ** Metadata table: for first time, since the rollback of C3 failed while 
inflight, there won't be any trace of this in metadata table. but when we retry 
for 2nd time, this should get applied to metadata table. the rollback plan fix 
should ensure rollback commit metadata has all file info from original plan and 
not just the successfully deleted ones. bcoz, in this case, during 2nd time, 
only pending files will be deleted.
 ** If by chance, one of the rollback gets committted to metadata table and 
failed before getting committed to data table: the 2nd time rollback of same 
instant is yet another delta commit to metadata table and so we should be good 
there too. we might instruct metadata table to delete files repeatedly may be. 

 

Update:

I didn't realize that individual rollbacks are not published to timeline as 
part of restore. So, if restore fails midway, in the 2nd attempt, only subset 
of rollback will be applied to metadata table. so, we need a plan for restore 
as well. 

But since restore anyway is a destructive operation and is advised to stop all 
processes, we do have an option to clean up metadata table and reboostrap 
completely once restore is complete. 

 

 

 


> Fix restore by adding a requested instant and restore plan
> ----------------------------------------------------------
>
>                 Key: HUDI-2432
>                 URL: https://issues.apache.org/jira/browse/HUDI-2432
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Major
>             Fix For: 0.10.0
>
>
> Fix restore by adding a requested instant and restore plan
>  
> Trying to see if we really need a plan. Dumping my thoughts here. 
> Restore internally converts to N no of rollbacks. We fetch active instants in 
> reverse order from timeline and trigger rollbacks 1 by 1. We have already 
> have a patch fixing rollback to add rollback Plan in rollback.requested meta 
> file. So, walking through failure scenarios. 
>  
> If 5 instants need to be rolledback, but process crashed after 3 rollbacks. 
>  * When we retry restore 2nd time, only pending 2 will be returned from 
> timeline for instants that need to be rolledback. And so we will rollback 
> remaining 2 commits/instants. Only missing piece will be the list of rollback 
> metadata that gets serialized as part of restore commit metadata might miss 
> first 3 commits. Anyways, restore is a destructive operation, not sure if not 
> serializing the already rolledback commit to restore commit metadata will 
> cause any issues. 
>  ** Metadata table: first 3 would have been rolledback in metadata table as 
> well (applied as upsert). and so should be fine when we retrigger the 
> restore. the rest 2 will get applied. 
>  * If there was a crash during a rollback was inflight.
>  ** let's say rollback of c3 failed while in progress. when we re-attempt 
> restore, we will again try to rollback c3 again. With the fix for rollback 
> plan in place, we should be good as we will continue the rollback and get it 
> to completion. and then go on to rollback C2 and C1. 
>  ** Metadata table: for first time, since the rollback of C3 failed while 
> inflight, there won't be any trace of this in metadata table. but when we 
> retry for 2nd time, this should get applied to metadata table. the rollback 
> plan fix should ensure rollback commit metadata has all file info from 
> original plan and not just the successfully deleted ones. bcoz, in this case, 
> during 2nd time, only pending files will be deleted.
>  ** If by chance, one of the rollback gets committted to metadata table and 
> failed before getting committed to data table: the 2nd time rollback of same 
> instant is yet another delta commit to metadata table and so we should be 
> good there too. we might instruct metadata table to delete files repeatedly 
> may be. 
>  
> Update:
> I didn't realize that individual rollbacks are not published to timeline as 
> part of restore. So, if restore fails midway, in the 2nd attempt, only subset 
> of rollback will be applied to metadata table(which got rolledback during the 
> 2nd attempt). so, we need a plan for restore as well. 
> But since restore anyway is a destructive operation and is advised to stop 
> all processes, we do have an option to clean up metadata table and reboostrap 
> completely once restore is complete. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-2432) Fix restore by adding a requested instant and restore plan

Reply via email to