[GitHub] [hudi] danny0405 commented on a diff in pull request #9037: [HUDI-6420] Fixing Hfile on-demand and prefix based reads to use optimized apis

via GitHub Thu, 22 Jun 2023 19:50:27 -0700


danny0405 commented on code in PR #9037:
URL: https://github.com/apache/hudi/pull/9037#discussion_r1239266143



##########
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroHFileReader.java:
##########
@@ -120,20 +120,14 @@ public HoodieAvroHFileReader(Path path, HFile.Reader 
reader, Option<Schema> sche
         .orElseGet(() -> Lazy.lazily(() -> fetchSchema(reader)));
   }
 
-  @Override
-  public Option<HoodieRecord<IndexedRecord>> getRecordByKey(String key, Schema 
readerSchema) throws IOException {
-    synchronized (sharedScannerLock) {
-      return fetchRecordByKeyInternal(sharedScanner, key, getSchema(), 
readerSchema)
-          .map(data -> unsafeCast(new HoodieAvroIndexedRecord(data)));
-    }
-  }
 
   @Override
   public ClosableIterator<HoodieRecord<IndexedRecord>> 
getRecordsByKeysIterator(List<String> keys, Schema schema) throws IOException {
     // We're caching blocks for this scanner to minimize amount of traffic
     // to the underlying storage as we fetched (potentially) sparsely 
distributed
     // keys
     HFileScanner scanner = getHFileScanner(reader, true);
+    scanner.seekTo(); // places the cursor at the beginning of the first data 
block.
     ClosableIterator<IndexedRecord> iterator = new 
RecordByKeyIterator(scanner, keys, getSchema(), schema);

Review Comment:
   For full scan, we should always seek to the start again? I just though it is 
hard to maintain the `seekTo` for each caller.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] danny0405 commented on a diff in pull request #9037: [HUDI-6420] Fixing Hfile on-demand and prefix based reads to use optimized apis

Reply via email to