[GitHub] [hudi] prashantwason commented on a diff in pull request #8758: [HUDI-53] Implementation of record_index - a HUDI index based on the metadata table.

via GitHub Wed, 14 Jun 2023 15:40:57 -0700


prashantwason commented on code in PR #8758:
URL: https://github.com/apache/hudi/pull/8758#discussion_r1230242472



##########
hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java:
##########
@@ -275,30 +243,51 @@ public Map<Pair<String, String>, 
HoodieMetadataColumnStats> getColumnStats(final
 
     List<String> columnStatKeys = new ArrayList<>(sortedKeys);
     HoodieTimer timer = HoodieTimer.start();
-    List<Pair<String, Option<HoodieRecord<HoodieMetadataPayload>>>> 
hoodieRecordList =
+    Map<String, HoodieRecord<HoodieMetadataPayload>> hoodieRecords =

Review Comment:
   The eventual sorting happens deep down within the HoodieHFileDataBlock
       // HFile read will be efficient if keys are sorted, since on storage 
records are sorted by key.
       // This will avoid unnecessary seeks.
       List<String> sortedKeys = new ArrayList<>(keys);
       Collections.sort(sortedKeys);
   
   I think at BaseTableMetadata we should not assume the underlying 
implementation is HFile and hence sorting keys is faster. We should leave that 
to the storage block.
   
   When multiple shards are involved, the keys here would be split into 
multiple smaller key ranges with each range looked up on a single shard. So 
sorting within that smaller range is all we require. For this also it makes 
sense to leave the sorting to the lower storage block layer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] prashantwason commented on a diff in pull request #8758: [HUDI-53] Implementation of record_index - a HUDI index based on the metadata table.

Reply via email to