alexeykudinkin commented on code in PR #6782:
URL: https://github.com/apache/hudi/pull/6782#discussion_r1073024689


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java:
##########
@@ -108,30 +116,94 @@ protected HoodieMergedLogRecordScanner(FileSystem fs, 
String basePath, List<Stri
     }
   }
 
-  protected void performScan() {
+  /**
+   * Scans delta-log files processing blocks
+   */
+  public final void scan() {
+    scan(false);
+  }
+
+  public final void scan(boolean skipProcessingBlocks) {
+    if (forceFullScan) {
+      // NOTE: When full-scan is enforced, scanning is invoked upfront (during 
initialization)
+      return;
+    }
+
+    scanInternal(Option.empty(), skipProcessingBlocks);
+  }
+
+  /**
+   * Provides incremental scanning capability where only provided keys will be 
looked
+   * up in the delta-log files, scanned and subsequently materialized into the 
internal
+   * cache
+   *
+   * @param keys to be looked up
+   */
+  public void scanByFullKeys(List<String> keys) {
+    if (forceFullScan) {
+      return; // no-op
+    }
+
+    List<String> missingKeys = keys.stream()
+        .filter(key -> !records.containsKey(key))
+        .collect(Collectors.toList());
+
+    if (missingKeys.isEmpty()) {
+      // All the required records are already fetched, no-op
+      return;
+    }
+
+    scanInternal(Option.of(KeySpec.fullKeySpec(missingKeys)), false);
+  }
+
+  /**
+   * Provides incremental scanning capability where only keys matching 
provided key-prefixes
+   * will be looked up in the delta-log files, scanned and subsequently 
materialized into
+   * the internal cache
+   *
+   * @param keyPrefixes to be looked up
+   */
+  public void scanByKeyPrefixes(List<String> keyPrefixes) {
+    if (forceFullScan || scannedPrefixes.containsAll(keyPrefixes)) {
+      // We can skip scanning in following 2 cases
+      //    - Reader is in full-scan mode, in which case all blocks are 
processed
+      //    upfront (no additional scanning is necessary)
+      //    - When same prefixes had already been handled
+      return;
+    }
+
+    // NOTE: When looking up by key-prefixes unfortunately we can't 
short-circuit
+    //       and will have to scan every time as we can't know (based on just
+    //       the records cached) whether particular prefix was scanned or just 
records
+    //       matching the prefix looked up (by [[scanByFullKeys]] API)
+    scanInternal(Option.of(KeySpec.prefixKeySpec(keyPrefixes)), false);

Review Comment:
   Discussed offline: we can replicate same thing as for the full-keys -- we 
can filter out prefixes that we've already scanned for



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to