[
https://issues.apache.org/jira/browse/HUDI-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717841#comment-17717841
]
Prashant Wason commented on HUDI-6153:
--------------------------------------
Rollbacks and Restores in Metadata Table
----------------------------------------------------------
*Nomenclature:*
- Rollbacks are for fixing failed commits. This involves removing incomplete
changes from a single commit.
- Restore is for restoring the dataset to a point in time in the past. This
involves removing completed changes from one of more commits.
- Commits at time 't' are represented as Ct in the examples below
- Deltacommits at time 't' are represented as DCt in the examples below
- Rollbacks at time 't' are represented as Rt in the examples below
- Restore at time 't' are represented as RSt in the examples below
*Rollbacks:*
When a rollback is issued (example: Rollback at time T6, rolling back failed
commit at time T5), it will delete the relevant files from the dataset and
create a HoodieRollbackMetadata.
This will be updated into the MDT. The MDT responds by rolling back the
deltacommit corresponding to T5. This deltacommit will have the same timestamp
as the dataset commit of T5.
There are four possible cases:
1. DC5 was after the latest compaction:
Starting MDT timeline: DC1 DC2 DC3 DC4 Commit4001 DC5
In this case we can simply rollback DC5 and this will lead to the following
MDT timeline
Final MDT timeline: DC1 DC2 DC3 DC4 Commit4001 R6
2. DC5 was before the latest compaction:
Starting MDT timeline: DC1 DC2 DC3 DC4 DC5 Commit5001
This should never happen as MDT compaction takes place ONLY if there are no
incomplete instants in the dataset and MDT timelines. Since C5 is being rolled
back, it should not have
completed (otherwise we need to restore) and hence MDT compaction at 5001
cannot take place.
3. DC5 is not present in the MDT timeline:
Starting MDT timeline: DC1 DC2 DC3 DC4
This means that C5 was not applied to MDT in the first place. We don't
really need to do anything here but for sake of proper debuggability, we will
create an empty Rollback in MDT.
Final MDT timeline: DC1 DC2 DC3 DC4 R6
4. DC5 is not present in the MDT timeline or is before timeline starts
Starting MDT timeline: DC6
or
Starting MDT timeline: DC4001
This should not happen as archiving will keep at least some instants and
compaction cannot take place with a leftover C5 in the dataset.
*Restore:*
When a restore is issued, we move the dataset back in time. The latest write
actions are removed one by one and finally a HoodieRestoreMetadata is
generated. This will be updated into the MDT.
The MDT responds by restoring itself to the same point in time. But there is
some extra steps required. Lets take an example:
Dataset Timeline: C1 C2 C3 C4 Clean5 C6 C7 Clean8
MDT Timeline: DC1 DC2 DC3 DC4 DC5 DC6 DC7 DC8
We can see that each action on the dataset leads to a DC on the MDT timeline.
So even non-write actions (Clean) are converted into a write timeline actions
for MDT. This complicates restore as the below example shows.
Assume we want to rollback the dataset to C3. We issue a Restore RS9 on the
dataset which will remove C7, C6 and C4. Cleans cannot be removed during
restore (in other words restore moves back ONLY the write timeline).
Dataset Timeline after Restore: C1 C2 C3 Clean5 Clean8 RS9
Now, if we were to simply restore the MDT Timeline also to DC3, we will end up
with this:
MDT Timeline after Restore: DC1 DC2 DC3 RS9
In this case, we have removed not only DC7, DC6 and DC4 (corresponding to C7,
C6 and C4 removed from the dataset) but also DC8, DC5 (corresponding to Clean5
and Clean8 which are still left on the dataset timeline).
This "removal" of delta commits corresponding to Cleans will cause the already
cleaned files to re-appear within MDT and hence will make the MDT inconsistent.
So we will need to "reapply" all the Cleans after the restore operation
completes. This will involve reading Clean5, CLean8 and creating a new
deltacommit on MDT. The final state of the two timelines will be as follows:
Final dataset Timeline after Restore: C1 C2 C3 Clean5 Clean8 RS9
Final MDT Timeline after Restore: DC1 DC2 DC3 RS9 DC9
Its the DC9 above which contains the data for the Clean5 and Clean8.
The above idea requires the Clean timeline after the restore point (C3) to not
have been archived. This cannot be guaranteed in cases where Clean is probably
being issued more aggressively causing archival to kick
in and remove the clean instants Clean8 or Clean5.
We can handle this by re-synching the MDT itself post restore. This means we
will do the following steps after restore is completed:
1. List all partitions and their files from the file system
2. For each partition, compare the file list to that saved within the MDT and
find files which do not exist on the FileSystem
3. For all deleted files found in Step 2, create DC9 which is essentially the
replay of Clean5 and Clean8.
Since Restore already requires all operations on the dataset to be stopped, we
can therefore safely do these steps in code without worrying about the total
time.
> Change the rollback mechanism for MDT to actual rollbacks rather than
> appending revert blocks
> ---------------------------------------------------------------------------------------------
>
> Key: HUDI-6153
> URL: https://issues.apache.org/jira/browse/HUDI-6153
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Prashant Wason
> Assignee: Prashant Wason
> Priority: Major
>
> When rolling back completed commits for indexes like record-index, the list
> of all keys removed from the dataset is required. This information cannot be
> available during rollback processing in MDT since the files have already been
> deleted during the rollback inflight processing.
> Hence, the current MDT rollback mechanism of adding -files, -col_stats
> entries does not work for record index.
> This PR changes the rollback mechanism to actually rollback deltacommits on
> the MDT. This makes the rollback handing faster and keeps the MDT in sync
> with dataset.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)