alexeykudinkin commented on code in PR #5208:
URL: https://github.com/apache/hudi/pull/5208#discussion_r842292552
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##########
@@ -133,14 +141,61 @@ private void initIfNeeded() {
}
@Override
- protected List<Pair<String, Option<HoodieRecord<HoodieMetadataPayload>>>>
getRecordsByKeys(List<String> keys,
-
String partitionName) {
+ public HoodieData<HoodieRecord<HoodieMetadataPayload>>
getRecordsByKeyPrefixes(List<String> keyPrefixes,
+
String partitionName) {
+ // NOTE: Since we partition records to a particular file-group by full
key, we will have
+ // to scan all file-groups for all key-prefixes as each of these
might contain some
+ // records matching the key-prefix
+ List<FileSlice> partitionFileSlices =
+
HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(metadataMetaClient,
partitionName);
+
+ return engineContext.parallelize(partitionFileSlices)
+ .flatMap(
+ (SerializableFunction<FileSlice, Iterator<Pair<String,
Option<HoodieRecord<HoodieMetadataPayload>>>>>) fileSlice -> {
+ // we are moving the readers to executors in this code path. So,
reusing readers may not make sense.
+ Pair<HoodieFileReader, HoodieMetadataMergedLogRecordReader>
readers =
+ openReadersIfNeeded(partitionName, fileSlice, false);
Review Comment:
Chatted offline: only "files" partition could be configured to do full-scan
of the logs, while "column_stats", "bloom_filters" will have to go t/h point
lookups
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]