[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5208: [HUDI-3760] Adding capability to fetch Metadata Records by prefix

GitBox Mon, 04 Apr 2022 19:58:05 -0700


alexeykudinkin commented on code in PR #5208:
URL: https://github.com/apache/hudi/pull/5208#discussion_r842292552



##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##########
@@ -133,14 +141,61 @@ private void initIfNeeded() {
   }
 
   @Override
-  protected List<Pair<String, Option<HoodieRecord<HoodieMetadataPayload>>>> 
getRecordsByKeys(List<String> keys,
-                                                                               
              String partitionName) {
+  public HoodieData<HoodieRecord<HoodieMetadataPayload>> 
getRecordsByKeyPrefixes(List<String> keyPrefixes,
+                                                                               
  String partitionName) {
+    // NOTE: Since we partition records to a particular file-group by full 
key, we will have
+    //       to scan all file-groups for all key-prefixes as each of these 
might contain some
+    //       records matching the key-prefix
+    List<FileSlice> partitionFileSlices =
+        
HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(metadataMetaClient, 
partitionName);
+
+    return engineContext.parallelize(partitionFileSlices)
+        .flatMap(
+            (SerializableFunction<FileSlice, Iterator<Pair<String, 
Option<HoodieRecord<HoodieMetadataPayload>>>>>) fileSlice -> {
+              // we are moving the readers to executors in this code path. So, 
reusing readers may not make sense.
+              Pair<HoodieFileReader, HoodieMetadataMergedLogRecordReader> 
readers =
+                  openReadersIfNeeded(partitionName, fileSlice, false);

Review Comment:
   Chatted offline: only "files" partition could be configured to do full-scan 
of the logs, while "column_stats", "bloom_filters" will have to go t/h point 
lookups



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5208: [HUDI-3760] Adding capability to fetch Metadata Records by prefix

Reply via email to