[ 
https://issues.apache.org/jira/browse/HUDI-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717841#comment-17717841
 ] 

Prashant Wason commented on HUDI-6153:
--------------------------------------

Rollbacks and Restores  in Metadata Table
----------------------------------------------------------

*Nomenclature:*
 - Rollbacks are for fixing failed commits. This involves removing incomplete 
changes from a single commit.
 - Restore is for restoring the dataset to a point in time in the past. This 
involves removing completed changes from one of more commits.
 - Commits at time 't' are represented as Ct in the examples below
 - Deltacommits at time 't' are represented as DCt in the examples below
 - Rollbacks at time 't' are represented as Rt in the examples below
 - Restore at time 't' are represented as RSt in the examples below


*Rollbacks:*
When a rollback is issued (example: Rollback at time T6, rolling back failed 
commit at time T5), it will delete the relevant files from the dataset and 
create a HoodieRollbackMetadata. 
This will be updated into the MDT. The MDT responds by rolling back the 
deltacommit corresponding to T5. This deltacommit will have the same timestamp 
as the dataset commit of T5.

There are four possible cases:
1. DC5 was after the latest compaction:
         Starting MDT timeline:  DC1 DC2 DC3 DC4 Commit4001 DC5

   In this case we can simply rollback DC5 and this will lead to the following 
MDT timeline
         Final MDT timeline:  DC1 DC2 DC3 DC4 Commit4001 R6


2. DC5 was before the latest compaction:
         Starting MDT timeline:  DC1 DC2 DC3 DC4 DC5 Commit5001

   This should never happen as MDT compaction takes place ONLY if there are no 
incomplete instants in the dataset and MDT timelines. Since C5 is being rolled 
back, it should not have 
   completed (otherwise we need to restore) and hence MDT compaction at 5001 
cannot take place.


3. DC5 is not present in the MDT timeline:
         Starting MDT timeline:  DC1 DC2 DC3 DC4

    This means that C5 was not applied to MDT in the first place. We don't 
really need to do anything here but for sake of proper debuggability, we will 
create an empty Rollback in MDT.

         Final MDT timeline:  DC1 DC2 DC3 DC4 R6


4. DC5 is not present in the MDT timeline or is before timeline starts
         Starting MDT timeline:  DC6
                or
         Starting MDT timeline:  DC4001

    This should not happen as archiving will keep at least some instants and 
compaction cannot take place with a leftover C5 in the dataset.

 


*Restore:*
When a restore is issued, we move the dataset back in time. The latest write 
actions are removed one by one and finally a HoodieRestoreMetadata is 
generated. This will be updated into the MDT. 
The MDT responds by restoring itself to the same point in time. But there is 
some extra steps required. Lets take an example:

Dataset Timeline: C1   C2   C3   C4   Clean5 C6   C7   Clean8
MDT Timeline:     DC1  DC2  DC3  DC4  DC5    DC6  DC7  DC8

We can see that each action on the dataset leads to a DC on the MDT timeline. 
So even non-write actions (Clean) are converted into a write timeline actions 
for MDT. This complicates restore as the below example shows.

Assume we want to rollback the dataset to C3. We issue a Restore RS9 on the 
dataset which will remove C7, C6 and C4. Cleans cannot be removed during 
restore (in other words restore moves back ONLY the write timeline).

Dataset Timeline after Restore: C1   C2   C3  Clean5 Clean8  RS9

Now, if we were to simply restore the MDT Timeline also to DC3, we will end up 
with this:

MDT Timeline after Restore:    DC1  DC2  DC3  RS9

In this case, we have removed not only DC7, DC6 and DC4 (corresponding to C7, 
C6 and C4 removed from the dataset) but also DC8, DC5 (corresponding to Clean5 
and Clean8 which are still left on the dataset timeline). 
This "removal" of delta commits corresponding to Cleans will cause the already 
cleaned files to re-appear within MDT and hence will make the MDT inconsistent.

So we will need to "reapply" all the Cleans after the restore operation 
completes. This will involve reading Clean5, CLean8 and creating a new 
deltacommit on MDT. The final state of the two timelines will be as follows:


Final dataset Timeline after Restore: C1   C2   C3  Clean5 Clean8  RS9
Final MDT Timeline after Restore:    DC1  DC2  DC3                RS9 DC9

Its the DC9 above which contains the data for the Clean5 and Clean8.

The above idea requires the Clean timeline after the restore point (C3) to not 
have been archived. This cannot be guaranteed in cases where Clean is probably 
being issued more aggressively causing archival to kick 
in and remove the clean instants Clean8 or Clean5.


We can handle this by re-synching the MDT itself post restore. This means we 
will do the following steps after restore is completed:
1. List all partitions and their files from the file system
2. For each partition, compare the file list to that saved within the MDT and 
find files which do not exist on the FileSystem
3. For all deleted files found in Step 2, create DC9 which is essentially the 
replay of Clean5 and Clean8.

Since Restore already requires all operations on the dataset to be stopped, we 
can therefore safely do these steps in code without worrying about the total 
time.

 

> Change the rollback mechanism for MDT to actual rollbacks rather than 
> appending revert blocks
> ---------------------------------------------------------------------------------------------
>
>                 Key: HUDI-6153
>                 URL: https://issues.apache.org/jira/browse/HUDI-6153
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Prashant Wason
>            Assignee: Prashant Wason
>            Priority: Major
>
> When rolling back completed commits for indexes like record-index, the list 
> of all keys removed from the dataset is required. This information cannot be 
> available during rollback processing in MDT since the files have already been 
> deleted during the rollback inflight processing. 
> Hence, the current MDT rollback mechanism of adding -files, -col_stats 
> entries does not work for record index.
> This PR changes the rollback mechanism to actually rollback deltacommits on 
> the MDT. This makes the rollback handing faster and keeps the MDT in sync 
> with dataset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to