nsivabalan commented on a change in pull request #3762:
URL: https://github.com/apache/hudi/pull/3762#discussion_r724642330
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java
##########
@@ -132,18 +149,31 @@ protected AbstractHoodieLogRecordScanner(FileSystem fs,
String basePath, List<St
}
this.totalLogFiles.addAndGet(logFilePaths.size());
this.logFilePaths = logFilePaths;
- this.readBlocksLazily = readBlocksLazily;
this.reverseReader = reverseReader;
+ this.readBlocksLazily = readBlocksLazily;
this.fs = fs;
this.bufferSize = bufferSize;
this.instantRange = instantRange;
this.withOperationField = withOperationField;
+ this.enableInlineReading = enableInlineReading;
+ this.enableFullScan = enableFullScan;
+ if (!enableFullScan) {
+ ValidationUtils.checkArgument(enableInlineReading, "Inline should be
enabled if full scan is not enabled");
+ }
}
- /**
- * Scan Log files.
- */
public void scan() {
+ scan(Collections.emptyList());
+ }
+
+ public void scan(List<String> keys) {
+ currentInstantLogBlocks = new ArrayDeque<>();
Review comment:
One thing to be cautious about seek based approach vs full scan. In full
scan, we do one time full scan and prepare a hashmap of records. so, any number
of look up can be done without any cost.
But with seek based approach, if users calls
scan(list of 3 keys)
scan(list of 5 keys)
we might have to read/parse through the log blocks twice since everytime we
are looking for only interested keys. so, we should be cautious in using the
seek based read for metadata table.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]