prashantwason commented on a change in pull request #4821:
URL: https://github.com/apache/hudi/pull/4821#discussion_r812666713
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java
##########
@@ -429,25 +428,21 @@ public void mergeArchiveFiles(List<FileStatus>
compactCandidate) throws IOExcept
.collect(Collectors.groupingBy(i -> Pair.of(i.getTimestamp(),
HoodieInstant.getComparableAction(i.getAction()))));
- // If metadata table is enabled, do not archive instants which are more
recent than the last compaction on the
- // metadata table.
- if (config.isMetadataTableEnabled()) {
- try (HoodieTableMetadata tableMetadata =
HoodieTableMetadata.create(table.getContext(), config.getMetadataConfig(),
- config.getBasePath(),
FileSystemViewStorageConfig.SPILLABLE_DIR.defaultValue())) {
- Option<String> latestCompactionTime =
tableMetadata.getLatestCompactionTime();
Review comment:
There can be the case where deltacommit on the metadata table succeeds
but commit on dataset fails (inflight -> complete transition fails).
In this case, we should not be using the deltacommit on the metadata table
and to check that we need commits on the dataset to not be archived. If you
archive the commits on the datasets indeendantly then we cannot differentiate
between the error case above and the case where commit is missing from dataset
timeline because it was archived.
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java
##########
@@ -429,25 +428,21 @@ public void mergeArchiveFiles(List<FileStatus>
compactCandidate) throws IOExcept
.collect(Collectors.groupingBy(i -> Pair.of(i.getTimestamp(),
HoodieInstant.getComparableAction(i.getAction()))));
- // If metadata table is enabled, do not archive instants which are more
recent than the last compaction on the
- // metadata table.
- if (config.isMetadataTableEnabled()) {
- try (HoodieTableMetadata tableMetadata =
HoodieTableMetadata.create(table.getContext(), config.getMetadataConfig(),
- config.getBasePath(),
FileSystemViewStorageConfig.SPILLABLE_DIR.defaultValue())) {
- Option<String> latestCompactionTime =
tableMetadata.getLatestCompactionTime();
- if (!latestCompactionTime.isPresent()) {
- LOG.info("Not archiving as there is no compaction yet on the
metadata table");
- instants = Stream.empty();
- } else {
- LOG.info("Limiting archiving of instants to latest compaction on
metadata table at " + latestCompactionTime.get());
- instants = instants.filter(instant ->
HoodieTimeline.compareTimestamps(instant.getTimestamp(),
HoodieTimeline.LESSER_THAN,
- latestCompactionTime.get()));
- }
- } catch (Exception e) {
- throw new HoodieException("Error limiting instant archival based on
metadata table", e);
+ // If this is a metadata table, do not archive the commits that live in
data set
+ // active timeline. This is required by metadata table,
+ // see HoodieTableMetadataUtil#processRollbackMetadata for details.
+ if (HoodieTableMetadata.isMetadataTable(config.getBasePath())) {
Review comment:
+1
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]