alexeykudinkin commented on code in PR #5208:
URL: https://github.com/apache/hudi/pull/5208#discussion_r842251622
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##########
@@ -133,14 +141,61 @@ private void initIfNeeded() {
}
@Override
- protected List<Pair<String, Option<HoodieRecord<HoodieMetadataPayload>>>>
getRecordsByKeys(List<String> keys,
-
String partitionName) {
+ public HoodieData<HoodieRecord<HoodieMetadataPayload>>
getRecordsByKeyPrefixes(List<String> keyPrefixes,
+
String partitionName) {
+ // NOTE: Since we partition records to a particular file-group by full
key, we will have
+ // to scan all file-groups for all key-prefixes as each of these
might contain some
+ // records matching the key-prefix
+ List<FileSlice> partitionFileSlices =
+
HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(metadataMetaClient,
partitionName);
+
+ return engineContext.parallelize(partitionFileSlices)
+ .flatMap(
+ (SerializableFunction<FileSlice, Iterator<Pair<String,
Option<HoodieRecord<HoodieMetadataPayload>>>>>) fileSlice -> {
+ // we are moving the readers to executors in this code path. So,
reusing readers may not make sense.
+ Pair<HoodieFileReader, HoodieMetadataMergedLogRecordReader>
readers =
+ openReadersIfNeeded(partitionName, fileSlice, false);
Review Comment:
Good point. Let's create a ticket to not forget to follow up on it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]