zhangyue19921010 commented on a change in pull request #4810:
URL: https://github.com/apache/hudi/pull/4810#discussion_r809756988
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
##########
@@ -380,6 +380,19 @@ protected boolean
isBaseFileDueToPendingCompaction(HoodieBaseFile baseFile) {
&&
baseFile.getCommitTime().equals(compactionWithInstantTime.get().getKey());
}
+ /**
+ * With async clustering, it is possible to see partial/complete base-files
due to inflight-clustering, Ignore those
+ * base-files.
+ *
+ * @param baseFile base File
+ */
+ protected boolean isBaseFileDueToPendingClustering(HoodieBaseFile baseFile) {
+ List<String> pendingReplaceInstants =
+
metaClient.getActiveTimeline().filterPendingReplaceTimeline().getInstants().map(HoodieInstant::getTimestamp).collect(Collectors.toList());
Review comment:
Emmm, maybe we can't use `fgIdToPendingClustering` to do filter here.
Because the files recorded in `fgIdToPendingClustering` are committed file
and need to be seen.
What we need to filter here are the in-flight uncommitted data files
produced by clustering job.
So that we need to know the instant time of `xxxx.replacecommit.requested`
or `xxxx.replacecommit.inflight` and use it to filter out uncommitted
clustering creating data files instead of the files which need to be clustering.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]