nsivabalan commented on a change in pull request #3426:
URL: https://github.com/apache/hudi/pull/3426#discussion_r702969171
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java
##########
@@ -200,20 +200,19 @@ public boolean archiveIfRequired(HoodieEngineContext
context) throws IOException
.collect(Collectors.groupingBy(i -> Pair.of(i.getTimestamp(),
HoodieInstant.getComparableAction(i.getAction()))));
- // If metadata table is enabled, do not archive instants which are more
recent that the latest synced
- // instant on the metadata table. This is required for metadata table sync.
+ // If metadata table is enabled, do not archive instants which are more
recent that the last compaction on the
+ // metadata table.
if (config.useFileListingMetadata()) {
try (HoodieTableMetadata tableMetadata =
HoodieTableMetadata.create(table.getContext(), config.getMetadataConfig(),
config.getBasePath(),
FileSystemViewStorageConfig.FILESYSTEM_VIEW_SPILLABLE_DIR.defaultValue())) {
- Option<String> lastSyncedInstantTime =
tableMetadata.getSyncedInstantTime();
-
- if (lastSyncedInstantTime.isPresent()) {
- LOG.info("Limiting archiving of instants to last synced instant on
metadata table at " + lastSyncedInstantTime.get());
- instants = instants.filter(i ->
HoodieTimeline.compareTimestamps(i.getTimestamp(), HoodieTimeline.LESSER_THAN,
- lastSyncedInstantTime.get()));
- } else {
- LOG.info("Not archiving as there is no instants yet on the metadata
table");
+ Option<String> latestCompactionTime =
tableMetadata.getLatestCompactionTime();
Review comment:
bcoz, as per the patch, we assume metadata table is in full sync w/
dataset table, up until compacted time for sure. for delta commits, there could
be some gaps due to multi-writes. But when it comes to base file(metadata
table), we ensure there are no gaps and hence in this piece of code, if
compaction hasn't happened on metadata table, we can't archive. Else, we could
miss out on some commits not synced to metadata table. Also on similar lines,
we limit the instants only to those which are compacted in metadata table.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]