Zouxxyy commented on code in PR #9416:
URL: https://github.com/apache/hudi/pull/9416#discussion_r1290859610


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java:
##########
@@ -452,107 +431,137 @@ private Stream<HoodieInstant> 
getCommitInstantsToArchive() throws IOException {
               ? CompactionUtils.getOldestInstantToRetainForCompaction(
               table.getActiveTimeline(), 
config.getInlineCompactDeltaCommitMax())
               : Option.empty();
+      oldestInstantToRetainCandidates.add(oldestInstantToRetainForCompaction);
 
-      // The clustering commit instant can not be archived unless we ensure 
that the replaced files have been cleaned,
+      // 3. The clustering commit instant can not be archived unless we ensure 
that the replaced files have been cleaned,
       // without the replaced files metadata on the timeline, the fs view 
would expose duplicates for readers.
       // Meanwhile, when inline or async clustering is enabled, we need to 
ensure that there is a commit in the active timeline
       // to check whether the file slice generated in pending clustering after 
archive isn't committed.
       Option<HoodieInstant> oldestInstantToRetainForClustering =
           
ClusteringUtils.getOldestInstantToRetainForClustering(table.getActiveTimeline(),
 table.getMetaClient());
+      oldestInstantToRetainCandidates.add(oldestInstantToRetainForClustering);
+
+      // 4. If metadata table is enabled, do not archive instants which are 
more recent than the last compaction on the
+      // metadata table.
+      if (table.getMetaClient().getTableConfig().isMetadataTableAvailable()) {
+        try (HoodieTableMetadata tableMetadata = 
HoodieTableMetadata.create(table.getContext(), config.getMetadataConfig(), 
config.getBasePath())) {
+          Option<String> latestCompactionTime = 
tableMetadata.getLatestCompactionTime();
+          if (!latestCompactionTime.isPresent()) {
+            LOG.info("Not archiving as there is no compaction yet on the 
metadata table");
+            return Collections.emptyList();
+          } else {
+            LOG.info("Limiting archiving of instants to latest compaction on 
metadata table at " + latestCompactionTime.get());
+            oldestInstantToRetainCandidates.add(Option.of(new HoodieInstant(
+                HoodieInstant.State.COMPLETED, COMPACTION_ACTION, 
latestCompactionTime.get())));
+          }
+        } catch (Exception e) {
+          throw new HoodieException("Error limiting instant archival based on 
metadata table", e);
+        }
+      }
+
+      // 5. If this is a metadata table, do not archive the commits that live 
in data set
+      // active timeline. This is required by metadata table,
+      // see HoodieTableMetadataUtil#processRollbackMetadata for details.
+      if (table.isMetadataTable()) {
+        HoodieTableMetaClient dataMetaClient = HoodieTableMetaClient.builder()
+            
.setBasePath(HoodieTableMetadata.getDatasetBasePath(config.getBasePath()))
+            .setConf(metaClient.getHadoopConf())
+            .build();
+        Option<HoodieInstant> qualifiedEarliestInstant =
+            TimelineUtils.getEarliestInstantForMetadataArchival(
+                dataMetaClient.getActiveTimeline(), 
config.shouldArchiveBeyondSavepoint());
+
+        // Do not archive the instants after the earliest commit (COMMIT, 
DELTA_COMMIT, and
+        // REPLACE_COMMIT only, considering non-savepoint commit only if 
enabling archive
+        // beyond savepoint) and the earliest inflight instant (all actions).
+        // This is required by metadata table, see 
HoodieTableMetadataUtil#processRollbackMetadata
+        // for details.
+        // Todo: Remove #7580

Review Comment:
   After this PR, #7580 is no useful, consider remove or simplify it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to