manojpec commented on a change in pull request #4352:
URL: https://github.com/apache/hudi/pull/4352#discussion_r796236319
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java
##########
@@ -101,4 +116,34 @@ public static HoodieRecord getTaggedRecord(HoodieRecord
inputRecord, Option<Hood
}
return record;
}
+
+ /**
+ * Given a list of row keys and one file, return only row keys existing in
that file.
+ *
+ * @param filePath - File to filter keys from
+ * @param candidateRecordKeys - Candidate keys to filter
+ * @return List of candidate keys that are available in the file
+ */
+ public static List<String> filterKeysFromFile(Path filePath, List<String>
candidateRecordKeys,
+ Configuration configuration)
throws HoodieIndexException {
+ ValidationUtils.checkArgument(FSUtils.isBaseFile(filePath));
+ List<String> foundRecordKeys = new ArrayList<>();
+ try {
+ // Load all rowKeys from the file, to double-confirm
+ if (!candidateRecordKeys.isEmpty()) {
+ HoodieTimer timer = new HoodieTimer().startTimer();
+ HoodieFileReader fileReader =
HoodieFileReaderFactory.getFileReader(configuration, filePath);
+ Set<String> fileRowKeys = fileReader.filterRowKeys(new
TreeSet<>(candidateRecordKeys));
Review comment:
HUDI-3203 will address this. The construction of bloom filter need to
happen elsewhere where the property can be read. Its not going to be at this
place.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]