danny0405 commented on a change in pull request #2319:
URL: https://github.com/apache/hudi/pull/2319#discussion_r540808224
##########
File path:
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bloom/SparkHoodieBloomIndex.java
##########
@@ -122,13 +122,15 @@ public SparkHoodieBloomIndex(HoodieWriteConfig config) {
// Step 3: Obtain a RDD, for each incoming record, that already exists,
with the file id,
// that contains it.
+ JavaRDD<Tuple2<String, HoodieKey>> fileComparisonsRDD =
Review comment:
Thanks for the explanation, the `recordRDD` may be persisted, but the
computation in `explodeRecordRDDWithFileComparisons` still need to do 2 times,
right ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]