GitHub user zhli1142015 added a comment to the discussion: [Performance] Velox Bloom Filter Inefficiency vs. Photon at 1TB Scale
> Hi @shadowmmu Thanks for the detailed information — I see the issue now. This > appears to be a hard limit in the memory allocator, and modifying line may > help(https://github.com/facebookincubator/velox/blob/main/velox/common/memory/MemoryAllocator.h#L492) > Cc: @zhli1142015 > > To improve BHJ performance, Gluten has a WIP patch (#8931 ) that should > significantly improve performance for large BHJ workloads. > > On the query planning side, DBX performs much better than vanilla Spark, > especially on the DS benchmark. There are also several useful optimizations > from the Spark community, but not been merged. Some of the cloud vendors also > improved on the spark catalyst in their product to improve the planner. > > For the Aggregation perf diff, could you please also the the example queries? > > thanks, -yuan Internally, we’ve removed this limitation and are using Spark’s default value of 64 MB for bloomFilter.maxNumBits. @shadowmmu Is 1 GB the default setting for all queries? What is the overall impact of this? GitHub link: https://github.com/apache/incubator-gluten/discussions/11554#discussioncomment-15688885 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
