yihua commented on code in PR #8797:
URL: https://github.com/apache/hudi/pull/8797#discussion_r1206094130
##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/InstantRange.java:
##########
@@ -174,6 +195,11 @@ public Builder nullableBoundary(boolean nullable) {
return this;
}
+ public Builder explicitInstants(Set<String> instants) {
+ this.explicitInstants = CollectionUtils.createImmutableSet(instants);
+ return this;
+ }
+
Review Comment:
This is only used by `ExplicitMatchRange`. Should this setter be in the
builder of `ExplicitMatchRange` only?
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java:
##########
@@ -247,6 +256,17 @@ public List<WriteStatus> compact(HoodieCompactionHandler
compactionHandler,
}).collect(toList());
}
+ private InstantRange getMetadataLogReaderInstantRange(HoodieTableMetaClient
metadataMetaClient, String metadataBasePath) {
Review Comment:
@danny0405 @nsivabalan @prashantwason I'm a little rusty on the interplay
between MDT compaction and inflight instants on data table's timeline. While
I'm reading more code to refresh my memory, could you folks remind me of how
the the inflight instants are handled on data table's timeline after MDT
compaction is done in this case? I'm trying to see if the changes in this PR
is safe.
Suppose there's an inflight instant before a complete instant in the data
table's timeline (`C1.commit, C2.commit.inflight, C3.commit`) due to inflight
table service or concurrent writers. There's a hole in the timeline. In MDT
timeline, C1, C2, C3 are all committed and generate log files, and C2 should be
excluded in the metadata table merged log scanner, since C2 is not committed in
the table. If MDT compaction happens at this point, C2 is skipped in
compaction. Is there any case where C2's log file in MDT is kept for later and
it will be missed in MDT, since it's in the old file slice?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]