bvaradar commented on issue #1833: URL: https://github.com/apache/hudi/issues/1833#issuecomment-659892874
@tooptoop4 : Can you provide us the spark DAGs with times (Job, Stage and Task level) between 0.5.3 (with bucketized bloom index on) and 0.5.3 (with bucketized bloom index off). We need to see why you are seeing such a massive performance difference. Regarding your question, Please take a look at the comment in https://github.com/apache/hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java#L249 This is basically an exploded RDD of record-Key with files to be compared. Thanks, Balaji.V ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
