[
https://issues.apache.org/jira/browse/HUDI-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062781#comment-17062781
]
Vinoth Chandar commented on HUDI-686:
-------------------------------------
Running a local microbenchmark, I actually found that the extra shuffling in V2
Implementation (shuffles input two times, vs just 1 time caching and shuffling
just the keys in the current impl) actually makes it a tad slower.
*BloomIndexV2*
*!Screen Shot 2020-03-19 at 10.15.10 AM.png!*
*BloomIndex*
*!image-2020-03-19-10-17-43-048.png!*
> Implement BloomIndexV2 that does not depend on memory caching
> -------------------------------------------------------------
>
> Key: HUDI-686
> URL: https://issues.apache.org/jira/browse/HUDI-686
> Project: Apache Hudi (incubating)
> Issue Type: Improvement
> Components: Index, Performance
> Reporter: Vinoth Chandar
> Assignee: Vinoth Chandar
> Priority: Major
> Fix For: 0.6.0
>
> Attachments: Screen Shot 2020-03-19 at 10.15.10 AM.png, Screen Shot
> 2020-03-19 at 10.15.10 AM.png, Screen Shot 2020-03-19 at 10.15.10 AM.png,
> image-2020-03-19-10-17-43-048.png
>
>
> Main goals here is to provide a much simpler index, without advanced
> optimizations like auto tuned parallelism/skew handling but a better
> out-of-experience for small workloads.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)