danny0405 commented on a change in pull request #2319:
URL: https://github.com/apache/hudi/pull/2319#discussion_r540751159
##########
File path:
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bloom/SparkHoodieBloomIndex.java
##########
@@ -122,13 +122,15 @@ public SparkHoodieBloomIndex(HoodieWriteConfig config) {
// Step 3: Obtain a RDD, for each incoming record, that already exists,
with the file id,
// that contains it.
+ JavaRDD<Tuple2<String, HoodieKey>> fileComparisonsRDD =
Review comment:
I didn't check the Spark UI yet, just a simple analyze the process of
data writing. For each batch of records to write, the
`SparkHoodieBloomIndex.lookupIndex` was expected to be invoked once so the
`fileComparisonsRDD` should only be evaluated only once, is there other
invocation for `SparkHoodieBloomIndex.lookupIndex` ? Maybe i missed something.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]