GitHub user zhouyuan added a comment to the discussion: [Performance] Velox Bloom Filter Inefficiency vs. Photon at 1TB Scale
Hi @shadowmmu Thanks for the detailed information — I see the issue now. This appears to be a hard limit in the memory allocator, and modifying line may help(https://github.com/facebookincubator/velox/blob/main/velox/common/memory/MemoryAllocator.h#L492) Cc: @zhli1142015 To improve BHJ performance, Gluten has a WIP patch (https://github.com/apache/incubator-gluten/pull/8931 ) that should significantly improve performance for large BHJ workloads. On the query planning side, DBX performs much better than vanilla Spark, especially on the DS benchmark. There are also several useful optimizations from the Spark community, but not been merged. Some of the cloud vendors also improved on the spark catalyst in their product to improve the planner. For the Aggregation perf diff, could you please also the the example queries? thanks, -yuan GitHub link: https://github.com/apache/incubator-gluten/discussions/11554#discussioncomment-15686284 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
