yihua commented on code in PR #8797:
URL: https://github.com/apache/hudi/pull/8797#discussion_r1206094130


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/InstantRange.java:
##########
@@ -174,6 +195,11 @@ public Builder nullableBoundary(boolean nullable) {
       return this;
     }
 
+    public Builder explicitInstants(Set<String> instants) {
+      this.explicitInstants = CollectionUtils.createImmutableSet(instants);
+      return this;
+    }
+

Review Comment:
   This is only used by `ExplicitMatchRange`.  Should this setter be in the 
builder of `ExplicitMatchRange` only?



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java:
##########
@@ -247,6 +256,17 @@ public List<WriteStatus> compact(HoodieCompactionHandler 
compactionHandler,
     }).collect(toList());
   }
 
+  private InstantRange getMetadataLogReaderInstantRange(HoodieTableMetaClient 
metadataMetaClient, String metadataBasePath) {

Review Comment:
   @danny0405 @nsivabalan @prashantwason I'm a little rusty on the interplay 
between MDT compaction and inflight instants on data table's timeline.  While 
I'm reading more code to refresh my memory, could you folks remind me of how 
the the inflight instants are handled on data table's timeline after MDT 
compaction is done in this case?  I'm trying to see if the changes in this PR 
is safe.
   
   Suppose there's an inflight instant before a complete instant in the data 
table's timeline (`C1.commit, C2.commit.inflight, C3.commit`) due to inflight 
table service or concurrent writers.  There's a hole in the timeline.  In MDT 
timeline, C1, C2, C3 are all committed and generate log files, and C2 should be 
excluded in the metadata table merged log scanner, since C2 is not committed in 
the table.  If MDT compaction happens at this point, C2 is skipped in 
compaction.  Is there any case where C2's log file in MDT is kept for later and 
it will be missed in MDT, since it's in the old file slice?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to