prashantwason commented on code in PR #8758:
URL: https://github.com/apache/hudi/pull/8758#discussion_r1230242472
##########
hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java:
##########
@@ -275,30 +243,51 @@ public Map<Pair<String, String>,
HoodieMetadataColumnStats> getColumnStats(final
List<String> columnStatKeys = new ArrayList<>(sortedKeys);
HoodieTimer timer = HoodieTimer.start();
- List<Pair<String, Option<HoodieRecord<HoodieMetadataPayload>>>>
hoodieRecordList =
+ Map<String, HoodieRecord<HoodieMetadataPayload>> hoodieRecords =
Review Comment:
The eventual sorting happens deep down within the HoodieHFileDataBlock
// HFile read will be efficient if keys are sorted, since on storage
records are sorted by key.
// This will avoid unnecessary seeks.
List<String> sortedKeys = new ArrayList<>(keys);
Collections.sort(sortedKeys);
I think at BaseTableMetadata we should not assume the underlying
implementation is HFile and hence sorting keys is faster. We should leave that
to the storage block.
When multiple shards are involved, the keys here would be split into
multiple smaller key ranges with each range looked up on a single shard. So
sorting within that smaller range is all we require. For this also it makes
sense to leave the sorting to the lower storage block layer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]