danny0405 commented on code in PR #11947:
URL: https://github.com/apache/hudi/pull/11947#discussion_r1802209333


##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java:
##########
@@ -174,6 +170,64 @@ public static QueryInfo generateQueryInfo(JavaSparkContext 
jssc, String srcBaseP
     }
   }
 
+  public static IncrementalQueryAnalyzer getIncrementalQueryAnalyzer(
+      JavaSparkContext jssc,
+      String srcPath,
+      Option<String> lastCkptStr,
+      MissingCheckpointStrategy missingCheckpointStrategy,
+      int numInstantsFromConfig,
+      Option<SourceProfile<Integer>> latestSourceProfile) {
+    HoodieTableMetaClient metaClient = HoodieTableMetaClient.builder()
+        
.setConf(HadoopFSUtils.getStorageConfWithCopy(jssc.hadoopConfiguration()))
+        .setBasePath(srcPath)
+        .setLoadActiveTimelineOnLoad(true)
+        .build();
+
+    String startTime;
+    if (lastCkptStr.isPresent() && !lastCkptStr.get().isEmpty()) {
+      startTime = lastCkptStr.get();
+    } else if (missingCheckpointStrategy != null) {
+      switch (missingCheckpointStrategy) {
+        case READ_UPTO_LATEST_COMMIT:
+          startTime = DEFAULT_BEGIN_TIMESTAMP;
+          // disrespect numInstantsFromConfig when reading up to latest
+          numInstantsFromConfig = -1;
+          break;
+        case READ_LATEST:
+          Option<HoodieInstant> lastInstant = metaClient
+              .getCommitsAndCompactionTimeline()
+              .filterCompletedInstants()
+              .lastInstant();
+          startTime = lastInstant
+              .map(hoodieInstant -> 
instantTimeMinusMillis(hoodieInstant.getCompletionTime(), 1))

Review Comment:
   Still I kind of think the exclusion of start instant does not make sense, 
for e.g for the very first commit to consume, it should be inclusive, it should 
only be exclusive when we are sure the read is a following up concinuous read 
after the preceeding reads.
   
   Also for batch inc read, I kind of think the start commit should be 
inclusive.
   
   So from code insights, let's make the inclusiveness/exclusiveness 
adajustable instead of hard code, and only make the start instant exclusive 
when we are sure the read is a continuous read.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to