Re: [PR] [HUDI-8141] Incremental Query with Completion Time [hudi]

via GitHub Thu, 17 Oct 2024 18:56:40 -0700


danny0405 commented on code in PR #11947:
URL: https://github.com/apache/hudi/pull/11947#discussion_r1805712611



##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/SnapshotLoadQuerySplitter.java:
##########
@@ -118,6 +124,32 @@ public QueryInfo getNextCheckpoint(Dataset<Row> df, 
QueryInfo queryInfo, Option<
         .orElse(queryInfo);
   }
 
+  public Option<CheckpointWithPredicates> getNextCheckpoint(Dataset<Row> df, 
QueryContext queryContext,
+                                                            
Option<SourceProfileSupplier> sourceProfileSupplier) {
+    // the start instant would be included into the final query result. So we 
need to get
+    // a strictly lower timestamp to have query splitter include the start 
instant
+    Option<CheckpointWithPredicates> nextCheckpointWithPredicates =
+        getNextCheckpointWithPredicates(df, 
instantTimeMinusMillis(queryContext.getBeginInstant().get(), 1));
+    if (nextCheckpointWithPredicates.isPresent()) {
+      // getNextCheckpointWithPredicates is based on instant times,
+      // so we need to translate the instant time to the completion time
+      String endInstantTime = 
nextCheckpointWithPredicates.get().getEndInstant();
+      Option<String> endCompletionTime = 
Option.fromJavaOptional(queryContext.getInstants().stream()
+          .filter(instant -> endInstantTime.equals(instant.getTimestamp()))

Review Comment:
   Isn't it just the max completion time from query context?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-8141] Incremental Query with Completion Time [hudi]

Reply via email to