manojpec commented on a change in pull request #4352:
URL: https://github.com/apache/hudi/pull/4352#discussion_r796945603
##########
File path:
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java
##########
@@ -125,30 +128,43 @@ private void initIfNeeded() {
return recordsByKeys.size() == 0 ? Option.empty() :
recordsByKeys.get(0).getValue();
}
- protected List<Pair<String, Option<HoodieRecord<HoodieMetadataPayload>>>>
getRecordsByKeys(List<String> keys, String partitionName) {
- Pair<HoodieFileReader, HoodieMetadataMergedLogRecordReader> readers =
openReadersIfNeeded(keys.get(0), partitionName);
- try {
- List<Long> timings = new ArrayList<>();
- HoodieFileReader baseFileReader = readers.getKey();
- HoodieMetadataMergedLogRecordReader logRecordScanner =
readers.getRight();
+ @Override
+ protected List<Pair<String, Option<HoodieRecord<HoodieMetadataPayload>>>>
getRecordsByKeys(List<String> keys,
+
String partitionName) {
+ Map<Pair<String, FileSlice>, List<String>> partitionFileSliceToKeysMap =
getPartitionFileSlices(partitionName, keys);
Review comment:
Caching getPartitionFileSlices() can lead to stale entries in the cache.
It depends on the how the user wants to read this partition. When reader reuse
is enabled, they are basically saying that file slices are not changing and can
use from cache. But for other cases, we always need to fetch the latest file
slice. Else, we might miss to read the latest entries.
We need a larger performance specific refactoring here. And it is not
related to indexing. HUDI-3300, HUDI-3301 will address them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]