codope commented on code in PR #10218:
URL: https://github.com/apache/hudi/pull/10218#discussion_r1410419696


##########
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/CompletionTimeQueryView.java:
##########
@@ -159,6 +166,50 @@ public Option<String> getCompletionTime(String startTime) {
       // the instant is still pending
       return Option.empty();
     }
+    loadCompletionTimeIncrementally(startTime);
+    return 
Option.ofNullable(this.startToCompletionInstantTimeMap.get(startTime));
+  }
+
+  /**
+   * Queries the instant start time with given completion time range.
+   *
+   * <p>By default, assumes there is at most 1 day time of duration for an 
instant to accelerate the queries.
+   *
+   * @param startCompletionTime The start completion time.
+   * @param endCompletionTime   The end completion time.
+   *
+   * @return The instant time set.
+   */
+  public Set<String> getStartTimeSet(String startCompletionTime, String 
endCompletionTime) {
+    // assumes any instant/transaction lasts at most 1 day to optimize the 
query efficiency.

Review Comment:
   Typically it's a safe assumption as most commits last from a few minutes to 
an hour or so. But in some cases, when pipeline is blocked, commit can remain 
pending for longer duration. Should this be an internal config?



##########
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/CompletionTimeQueryView.java:
##########
@@ -159,6 +166,50 @@ public Option<String> getCompletionTime(String startTime) {
       // the instant is still pending
       return Option.empty();
     }
+    loadCompletionTimeIncrementally(startTime);
+    return 
Option.ofNullable(this.startToCompletionInstantTimeMap.get(startTime));
+  }
+
+  /**
+   * Queries the instant start time with given completion time range.
+   *
+   * <p>By default, assumes there is at most 1 day time of duration for an 
instant to accelerate the queries.
+   *
+   * @param startCompletionTime The start completion time.
+   * @param endCompletionTime   The end completion time.
+   *
+   * @return The instant time set.
+   */
+  public Set<String> getStartTimeSet(String startCompletionTime, String 
endCompletionTime) {
+    // assumes any instant/transaction lasts at most 1 day to optimize the 
query efficiency.
+    return getStartTimeSet(startCompletionTime, endCompletionTime, s -> 
HoodieInstantTimeGenerator.instantTimeMinusMillis(s, MILLI_SECONDS_IN_ONE_DAY));
+  }
+
+  /**
+   * Queries the instant start time with given completion time range.
+   *
+   * @param startCompletionTime The start completion time.
+   * @param endCompletionTime   The end completion time.
+   *
+   * @return The instant time set.
+   */
+  public Set<String> getStartTimeSet(String startCompletionTime, String 
endCompletionTime, Function<String, String> earliestStartTimeFunc) {
+    String startInstant = earliestStartTimeFunc.apply(startCompletionTime);
+    final InstantRange instantRange = InstantRange.builder()
+        .rangeType(InstantRange.RangeType.CLOSE_CLOSE)

Review Comment:
   not related to this PR - `RangeType` naming as `OPEN`, `CLOSED`, 
`LEFT_OPEN`, `RIGHT_OPEN` sounds more canonical. If you agree, feel free to 
fire another PR.



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/execution/benchmark/LSMTimelineReadBenchmark.scala:
##########
@@ -42,8 +42,9 @@ object LSMTimelineReadBenchmark extends HoodieBenchmarkBase {
    * Apple M2
    * pref load archived instants:              Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
    * 
------------------------------------------------------------------------------------------------------------------------
-   * read shim instants                                   18             32    
      15          0.1       17914.8       1.0X
-   * read instants with commit metadata                   19             25    
       5          0.1       19403.1       0.9X
+   * read slim instants                                  494            521    
      27          0.5        1899.6       1.0X
+   * read instants with commit metadata                 2544           2625    
     116          0.1        9785.9       0.2X
+   * read start time                                     156            177    
      26          1.7         601.1       3.2X

Review Comment:
   Just for my understanding, why is reading timeline with start time 3.2x 
slower? I did expect it to be a little slower but 3.2x sound a big difference.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to