linliu-code commented on code in PR #9581:
URL: https://github.com/apache/hudi/pull/9581#discussion_r1349221039
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java:
##########
@@ -173,18 +173,18 @@ public static <R> HoodieRecord<R>
tagRecord(HoodieRecord<R> record, HoodieRecord
*
* @param filePath - File to filter keys from
* @param candidateRecordKeys - Candidate keys to filter
- * @return List of candidate keys that are available in the file
+ * @return List of pairs of candidate keys and positions that are available
in the file
*/
- public static List<String> filterKeysFromFile(Path filePath, List<String>
candidateRecordKeys,
- Configuration configuration)
throws HoodieIndexException {
+ public static List<Pair<String, Long>> filterKeysFromFile(Path filePath,
List<String> candidateRecordKeys,
+ Configuration
configuration) throws HoodieIndexException {
ValidationUtils.checkArgument(FSUtils.isBaseFile(filePath));
- List<String> foundRecordKeys = new ArrayList<>();
+ List<Pair<String, Long>> foundRecordKeys = new ArrayList<>();
try (HoodieFileReader fileReader =
HoodieFileReaderFactory.getReaderFactory(HoodieRecordType.AVRO)
.getFileReader(configuration, filePath)) {
// Load all rowKeys from the file, to double-confirm
if (!candidateRecordKeys.isEmpty()) {
HoodieTimer timer = HoodieTimer.start();
- Set<String> fileRowKeys = fileReader.filterRowKeys(new
TreeSet<>(candidateRecordKeys));
+ Set<Pair<String, Long>> fileRowKeys =
fileReader.filterRowKeys(candidateRecordKeys.stream().collect(Collectors.toSet()));
Review Comment:
Is this change solely for performance purpose?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]