alexeykudinkin commented on issue #6188: URL: https://github.com/apache/hudi/issues/6188#issuecomment-1230832162
Hey, @floriandaniel! Thanks for taking the time to file very detailed description. First of all i believe the crux of the problem is likely lying in the realms of using Bloom Index of the Metadata table: we've recently identified a performance gap in there and @yihua is currently working on addressing that (there's already a PR in progress). Second, i'd recommend you to do following in your evaluation: 1. Try Hudi 0.12 that has been recently released (we've done a lot of performance benchmarking/optimizations during last release cycle specifically to make sure Hudi's performance is top of the line) 2. Disable `hoodie.bloom.index.use.metadata` for now (until above fix lands) 3. Any particular reason you switching off `hoodie.bloom.index.prune.by.ranges`? It's very crucial aspect of using the Bloom Index that allows to prune the search space considerably for update-heavy workloads only checking the files that could contain the target records (and eliminating ones that couldn't) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
