yihua commented on a change in pull request #4821:
URL: https://github.com/apache/hudi/pull/4821#discussion_r832536233
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java
##########
@@ -429,25 +428,21 @@ public void mergeArchiveFiles(List<FileStatus>
compactCandidate) throws IOExcept
.collect(Collectors.groupingBy(i -> Pair.of(i.getTimestamp(),
HoodieInstant.getComparableAction(i.getAction()))));
- // If metadata table is enabled, do not archive instants which are more
recent than the last compaction on the
- // metadata table.
- if (config.isMetadataTableEnabled()) {
- try (HoodieTableMetadata tableMetadata =
HoodieTableMetadata.create(table.getContext(), config.getMetadataConfig(),
- config.getBasePath(),
FileSystemViewStorageConfig.SPILLABLE_DIR.defaultValue())) {
- Option<String> latestCompactionTime =
tableMetadata.getLatestCompactionTime();
- if (!latestCompactionTime.isPresent()) {
- LOG.info("Not archiving as there is no compaction yet on the
metadata table");
- instants = Stream.empty();
- } else {
- LOG.info("Limiting archiving of instants to latest compaction on
metadata table at " + latestCompactionTime.get());
- instants = instants.filter(instant ->
HoodieTimeline.compareTimestamps(instant.getTimestamp(),
HoodieTimeline.LESSER_THAN,
- latestCompactionTime.get()));
- }
- } catch (Exception e) {
- throw new HoodieException("Error limiting instant archival based on
metadata table", e);
+ // If this is a metadata table, do not archive the commits that live in
data set
+ // active timeline. This is required by metadata table,
+ // see HoodieTableMetadataUtil#processRollbackMetadata for details.
+ if (HoodieTableMetadata.isMetadataTable(config.getBasePath())) {
Review comment:
@danny0405 Let's add this new logic on top of the existing metadata
table specific logic, i.e., checking for last compaction on the metadata table
and land the fix soon, without changing existing logic.
I understand you have concern around whether we need the check around
compaction. We can take that to a separate PR for discussion. The goal here
is to land this fix soon so we can do another round of testing on metadata
table. My worry is that the checking for last compaction on the metadata table
is still needed for some cases, and if we remove it, we may introduce new
problem before the last minute of the release cut, so for safety we can keep it
for now. WDYT?
If you're busy, I can take this up, revise the PR, and land it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]