danny0405 commented on code in PR #13653:
URL: https://github.com/apache/hudi/pull/13653#discussion_r2249062242
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/savepoint/SavepointHelpers.java:
##########
@@ -66,4 +72,28 @@ public static void validateSavepointPresence(HoodieTable
table, String savepoint
throw new HoodieRollbackException("No savepoint for instantTime " +
savepointTime);
}
}
+
+ private static class SavepointInstantComparator implements
Comparator<HoodieInstant> {
+ private final boolean tableVersion8OrLater;
+ private final InstantComparator instantComparator;
+
+ public SavepointInstantComparator(boolean tableVersion8OrLater,
InstantComparator instantComparator) {
+ this.tableVersion8OrLater = tableVersion8OrLater;
+ this.instantComparator = instantComparator;
+ }
+
+ @Override
+ public int compare(HoodieInstant o1, HoodieInstant o2) {
+ if (tableVersion8OrLater) {
+ return instantComparator.completionTimeOrderedComparator().compare(o1,
o2);
+ } else {
+ // Do to special handling of compaction instants, we need to use
requested time based comparator for compaction instants but completion time
based comparator for others
+ if (o1.getAction().equals(HoodieTimeline.COMMIT_ACTION) ||
o2.getAction().equals(HoodieTimeline.COMMIT_ACTION)) {
Review Comment:
> In v8, the delta commit is not directly tied to the base file commit time
so that is why we don't require this. In v8 if we remove the compaction in the
timeline described above, we can still safely query the table.
This is not true, our assumption for file slice is the newer file slice will
cover all the dataset in history, if we restore the compaction base files, the
log files in this file silce will just be kept in the file slice and there is
no base file to merge for read, then we would got a data loss(unless you keep
the reqestes compaction metadata file on the timeline but it seems not the
case).
For example we have
`t1.dc.req, t1.dc, t2.dc.req, t2.dc, t3.compaction.req, t4.dc.req, t4.dc,
t5.dc.req, t5.dc, t3.commit.`
Now we want to restore to t5, if we also restore t3.commit for V8 table, the
file slice that includes t4 logs will only have logs from t4, the history
dataset in the compaction would be lost.
So we should always use requested time comparison for compactions regardless
of the table versions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]