danny0405 commented on code in PR #9416:
URL: https://github.com/apache/hudi/pull/9416#discussion_r1291949979
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java:
##########
@@ -452,107 +431,137 @@ private Stream<HoodieInstant>
getCommitInstantsToArchive() throws IOException {
? CompactionUtils.getOldestInstantToRetainForCompaction(
table.getActiveTimeline(),
config.getInlineCompactDeltaCommitMax())
: Option.empty();
+ oldestInstantToRetainCandidates.add(oldestInstantToRetainForCompaction);
- // The clustering commit instant can not be archived unless we ensure
that the replaced files have been cleaned,
+ // 3. The clustering commit instant can not be archived unless we ensure
that the replaced files have been cleaned,
// without the replaced files metadata on the timeline, the fs view
would expose duplicates for readers.
// Meanwhile, when inline or async clustering is enabled, we need to
ensure that there is a commit in the active timeline
// to check whether the file slice generated in pending clustering after
archive isn't committed.
Option<HoodieInstant> oldestInstantToRetainForClustering =
ClusteringUtils.getOldestInstantToRetainForClustering(table.getActiveTimeline(),
table.getMetaClient());
+ oldestInstantToRetainCandidates.add(oldestInstantToRetainForClustering);
+
+ // 4. If metadata table is enabled, do not archive instants which are
more recent than the last compaction on the
+ // metadata table.
+ if (table.getMetaClient().getTableConfig().isMetadataTableAvailable()) {
+ try (HoodieTableMetadata tableMetadata =
HoodieTableMetadata.create(table.getContext(), config.getMetadataConfig(),
config.getBasePath())) {
+ Option<String> latestCompactionTime =
tableMetadata.getLatestCompactionTime();
+ if (!latestCompactionTime.isPresent()) {
+ LOG.info("Not archiving as there is no compaction yet on the
metadata table");
+ return Collections.emptyList();
+ } else {
+ LOG.info("Limiting archiving of instants to latest compaction on
metadata table at " + latestCompactionTime.get());
+ oldestInstantToRetainCandidates.add(Option.of(new HoodieInstant(
+ HoodieInstant.State.COMPLETED, COMPACTION_ACTION,
latestCompactionTime.get())));
+ }
+ } catch (Exception e) {
+ throw new HoodieException("Error limiting instant archival based on
metadata table", e);
+ }
+ }
+
+ // 5. If this is a metadata table, do not archive the commits that live
in data set
+ // active timeline. This is required by metadata table,
+ // see HoodieTableMetadataUtil#processRollbackMetadata for details.
+ if (table.isMetadataTable()) {
+ HoodieTableMetaClient dataMetaClient = HoodieTableMetaClient.builder()
+
.setBasePath(HoodieTableMetadata.getDatasetBasePath(config.getBasePath()))
+ .setConf(metaClient.getHadoopConf())
+ .build();
+ Option<HoodieInstant> qualifiedEarliestInstant =
+ TimelineUtils.getEarliestInstantForMetadataArchival(
+ dataMetaClient.getActiveTimeline(),
config.shouldArchiveBeyondSavepoint());
+
+ // Do not archive the instants after the earliest commit (COMMIT,
DELTA_COMMIT, and
+ // REPLACE_COMMIT only, considering non-savepoint commit only if
enabling archive
+ // beyond savepoint) and the earliest inflight instant (all actions).
+ // This is required by metadata table, see
HoodieTableMetadataUtil#processRollbackMetadata
+ // for details.
+ // Todo: Remove #7580
Review Comment:
The incremental cleaning needs to deser the clean plan, what if the cleaning
has been archived, is it affected?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]