satishkotha commented on a change in pull request #2048:
URL: https://github.com/apache/hudi/pull/2048#discussion_r488207278
##########
File path:
hudi-client/src/main/java/org/apache/hudi/table/HoodieTimelineArchiveLog.java
##########
@@ -301,6 +304,61 @@ private void deleteAnyLeftOverMarkerFiles(JavaSparkContext
jsc, HoodieInstant in
}
}
+ private void deleteReplacedFiles(HoodieInstant instant) {
+ if (!instant.isCompleted()) {
+ // only delete files for completed instants
+ return;
+ }
+
+ TableFileSystemView fileSystemView = this.table.getFileSystemView();
+ ensureReplacedPartitionsLoadedCorrectly(instant, fileSystemView);
+
+ Stream<HoodieFileGroup> fileGroupsToDelete = fileSystemView
Review comment:
Yes, compaction is the primary reason I only recorded fileId in the
replace metadata. When deleting, we can get all file paths (through view or by
listing using consolidated metadata) that have same fileId and delete these
files.
There can be race conditions that compaction might create a new file with
replaced fileId after we queried for existing files though. But because
FileSystemView#get methods do not include replaced file groups, I think this is
unlikely to happen. I'm not sure if there are edge cases with long running
compactions.
Please suggest any other improvements.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]