lamber-ken edited a comment on issue #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching URL: https://github.com/apache/incubator-hudi/pull/1469#issuecomment-612499511 Hi @vinothchandar base on your branch, there are mainly the following updates: - Rebase master branch - Add TestHoodieBloomIndexV2.java - Add DeltaTimer.java - Fix an implicit bug which causes repeat input record **Bug fix** In the stage of double check(`HoodieBloomIndexV2.LazyKeyChecker#computeNext`), when the target file doesn't contains the record key, should return `Option.empty()`. **Previous** ``` Option<HoodieRecord<T>> ret = fileIdOpt.map(fileId -> { if (currHandle == null || !currHandle.getFileId().equals(fileId)) { currHandle = new HoodieKeyLookupHandle<>(config, table, Pair.of(record.getPartitionPath(), fileId)); } Option<HoodieRecordLocation> location = currHandle.containsKey(record.getRecordKey()) ? Option.of(new HoodieRecordLocation(currHandle.getBaseInstantTime(), currHandle.getFileId())) : Option.empty(); return Option.of(getTaggedRecord(record, location)); }).orElse(Option.of(record)); ``` **Changes** ``` Option<HoodieRecord<T>> recordOpt = fileIdOpt.map((Function<String, Option<HoodieRecord<T>>>) fileId -> { DeltaTimer deltaTimer = new DeltaTimer(); if (currHandle == null || !currHandle.getFileId().equals(fileId)) { currHandle = new HoodieKeyLookupHandle<>(config, table, Pair.of(record.getPartitionPath(), fileId)); } totalReadTimeMs += deltaTimer.deltaTime(); if (currHandle.containsKey(record.getRecordKey())) { HoodieRecordLocation recordLocation = new HoodieRecordLocation(currHandle.getBaseInstantTime(), currHandle.getFileId()); return Option.of(getTaggedRecord(record, recordLocation)); } else { return Option.empty(); } }).orElse(Option.of(record)); ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services